Home

Next Generation Sequencing So ware User`s Manual Version 1.5

1. m OSS SS SSS qum qm qm qm AD DD DIES m SSS SSS SSS ASS uar uar Par Pur Pur ar l Pun Pur un ur Lun ur DS CJ PSI P CU PS CU PS UD Ud um qum uum um qum qum qum qm mme Emme mi Mapping Result Solexa_single_end_test UNIQUE Mapping Result abi test UNIQUET 1 1 1 1 1 1 1 1 1 1 m eem m m q um um cum pum pum qum um um m pm um m PS T P C94 em
2. m OSS SS SSS qum qm qm qm AD DD DIES m SSS SSS SSS ASS uar uar Par Pur Pur ar l Pun Pur un ur Lun ur DS CJ PSI P CU PS CU PS UD Ud um qum uum um qum qum qum qm T qu ua qum mme Emme moe m m ee m Lur q um um cum pum um qum uum pum um um pm um m PS T CSV CJ PS C PSU pum um SSS pum sum PUT m m m T e cum cm cm cmm cm um um qum um pum pm q
3. mm i 621 8 8 1150 B mapping the reference offset where the read 15 mapped to the leftmost position will be shown in the read base information panel The alignment between the target region on the reference and the read will be shown in the alignment panel If the quality score file 15 provided when creating the job the sequencing quality will be illustrated as black blocks The larger is the This panel is used to show the detailed information in the Mapping results illustrating window length of the block the higher the quality score of the base Such items as the Read Information Panel Mapping Summary Panel and SNP panel are shown on this panel Click a read on the mapping results illustrating window The read name the direction of the This panel displays detailed mapping information of a selected reads in the mapping results Detailed information panel illustrating window Read information panel read informabton Read name 1540 Reference offset 1261 Mapping Reverse VENTE TIT e read GCACTGCTTTTTCTGTATCCCAGAGGTTTTGATTAG sequence quality The red nucleotide
4. Ys ug eic TGACTGCATGACGTACGTACGACTGTC y ACGTACGACTGTGACTGACGTACGTAGCI User s Manual grece isa arie o TGACTGCATGACTGCATGACGTA ATGACGTACGTACGACTGT Maad ad aa GTACGTAGCTGACGATG TGCTGACGTACATGCA AGTCCTGACTGACT a Viel arr ACTGACGTACGTAG CTGACTGCATGACT CATGACGTACGTAC GACTGTGACTGAC TACGACTGTGACTGA CATGACGTACGTACG GACGTACATGCAGT CTGCATGACTGCAT ACGACTGTGACTGA CTGACGATGCTGAC GTCCTGACTGACTGC CTGACTGCATGACTGCAT GAC CC BI ACGTACGACTGTG od G der L ope ACGTAC ACTGA 4 Generation Sequencing Software from the makers of PatternHunter Bioinformatics Solutions Inc BIOINFORMATICS SOLUTIONS INC ZOOM Studio User s Manual Bioinformatics Solutions Inc 470 Weber St N Suite 204 Waterloo Ontario Canada N2L 012 Phone 519 885 8288 Fax 519 885 9075 Please contact BSI for questions suggestions for improvement Table of Contents 1 INTRODUCTION TO 20 5 Terminoloowana 2 apes 5 2 GETTING STARTED WITH 7
5. 126 Introduction 1 Introduction to ZOOM OOM Zillions Of Oligos Mapped is designed to map millions of short reads produced by next generation sequencing technology back to the reference genome and carry out post analysis in a user friendly way Based on a newly designed multiple spaced seeds theory ZOOM guarantees great mapping accuracy with unparalleled speed Both single and paired end reads of various lengths from 126 to 2406 can be handled Any number of mismatches and one insertion deletion of various lengths between the read and its target region on the reference sequence are allowed Uniquely mapped results or best top N results for each read will be reported according to the minimal mismatches and indel length between the read and its target positions ZOOM supports both Illumina Solexa and ABI SOLID instruments For Illumina Solexa data quality scores generated by the sequencer for each of the short sequenced reads can be incorporated to reduce ambiguity of read mapping For ABI SOLID data ZOOM directly aligns a color space read to a base space reference sequence ZOOM is therefore able to differentiate a true polymorphism on the base space from the sequencing errors on the color space and automatically corrects sequencing errors during the mapping process Reads in color space will be decoded into base space with both sequenci
6. 6 6 TGCTG D TGCTGAA GGTTAAGTGC The horizontal scrollbar acts to increase and decrease the offset of the focus region on the reference sequence aru 3800 3810 3870 3830 3841 38450 S60 4 eference sequence 3789 3866 11 33 eference se qene The scrollbar on the right helps to observe P appina Resut testen the reads mapped to this position that do not p 4 fit 1n the window AGCA VAGA TOCARAAR TCE ACA TAGA TGCAAAAA AAAAGTACAGGCCAATATACCTAATGAGCATAGATG AAAAACTACAGGCCAATATACCTAATGAGCATAGAT CAAAACTACAGGCCAATATACCTAATGAGCATAGAT AATCC AAAATC ACAAAACTACAGGCCAATATACCTAATGAGCATAG AAAATC AAAARBC ARAB LACAGGCEAA TA TACC TAA TGAGC CAAAAATC AAAAAAAAAAAACTGCAGGCCAATATACCTAATGAG RAAAAA TC AAAAAAA AAAAAAAAACAAAACTACAGGCCAA AVBRARARATCE ATAGATGCAAAAATC AACAAAAAAAAAACAAAACTACAGGCCAATATACCT GAATAGATGCAAAAATC AAAACCAGGAGAGGACATAACAAAAAAAAAAARAA rriv ee TOCRARAA TO AAAACCAGGAGAGGACATAACAAAAAAAAAACAA AATGAGCATAGATGCAAAAATC AAAACCAGGAGAGGACAT AAAAAAAACAAAACTACAGGCC
7. 2000 4000 6000 8000 10000 The region in the rectangular region will then be enlarged to the full window of the Mapping Results Displaying Window as follows Click on a region that you are interested in or 1f the length of the focusing region is less than 130bp the graph will adjust to show the detailed alignments between the reads and the reference sequence X Mapping Result Solexa single end test UNIQUE Difference between this read andthe reference sequence T TGTGAGAAACACCCAAGAA TGA TCAAT TAAAAA TCTGTGAGAAACACCCAAGAA TGA TCAA T T Read TCTGTGAGAAACACCCAAGAA TGA TCAA T T CGCAG TCTGTGAGAAACACCCAAGAA TGA TCA sequence AG TCTGTGAGAAACACCCAAGAA TGA T GGCCGGGCGCAG TCTGTGAGAAACACCCAAGAA GGCCGGGCGCAG TCTGTGAGAAACACCCAAGAA TGA T CGECCGGGCGCAG TCTGTGAGAAACACCCAAGA TGGTAATTTAAAAAATTACCGGCCGGGCGCOG CTGTGAGAA ATGGTAATTTAAAAAATTACCGGCCGGGCGCAG AATGGTA AAA CTGTGAGAAACACCCAAGAATGATCAAT TCTGT AAAAAATGG GGCCGGGCGC TCTGTGAGAAACACCCAAGAATGAT AAAAAAAAAA AATTACCGGCCGG TCTGTGAGAAACACCCAACA TCTGTGAGAAACACCCAA AAAAAAAATGGTAAT AAAAAAAAAAAAAAATGGTAATTTAAAAAATT CACCCAAGAATGATCAATTAAAAAAAAAAAAAATGG GAA AAAAAAAAAAAT AATATAAAA AAAAAATGGTAATTTAAAAAATTACCGGCCGGGAGC GCGCAG 7 3 B g aS Consensus s
8. ve er lt ACGTACATGCAGTCCTGACTGACTGCAT6 GACGTACATGCAGTCCTGACTGACTGC BTGACGATGCTGACGTACATGCAGTCC Nearer are vc MT GACGTACGTACGACTGTGACTGAC vica are Keyra en cy eere 470 Weber Street North Suite 204 G41 GCTGACGTACATGCAGTCC Waterloo Ontario Canada GCTGACGTACATGCAGTCCTGACTG 620 N2L6J2 TGACTGACTGCATGACTGCAE A ATGACGTACGTACGA 24 gut uu uoi us ge LEIICIECACIGIGA ee
9. ccna 73 4 8 CONTROL JOBS AND TASK 74 4 9 EXTRACT UNMAPPED READS TO CREATE A 000000 75 TR SYTEM CONTICORA TION 77 Em 77 85 5 E mu eod olo 77 T 77 LL tages 78 PACA SUT 78 gt MAPPINGRESULTS 79 5 1 RUIN Gis actin hh diese ute Epis Oth 79 T 50 vla EET 83 Reference Sequence SELECTING DAP As TETA ENTOTE 94 CP phos s dub 54 OU ELON 85 Derailed OM ATED acce se tet gets 86 9 2 SHOW MAPPING SUMMARY ba 89 5 3 SHOW MATPING RESULTS TOGETHER stus 91 6 SNP AND SMALL INDELS 94 6 1 FIND
10. 3 e II re ac C63 F 1200 1210 1220 1230 1240 1250 1280 1270 4 11 e iu reference sequence 1273 12 Click on any read in the Mapping Results Illustrating window ATCAAAACCTCAGGGATACAGAAAAAGCAGTGC TAA TACAGAAAAAGCAG TGC TAA ia CAAAACCTC rea The read will be highlight by a red TGGGA TACAGAAAAAGCAG rectangle At the same time more information of this mapped read will be shown in the read information tab window below the Mapping Results Illustrating window 37 read information Read name 1640 Reference offset 1261 Mapping direction Reverse nd read GCACTGCTTTTTCTGTATCCCAGAGGTTTTGAT T AG Sequence quality HENHEENEEEEEENEEEEEENEERENEREN HEm __ The red nucleotide is the difference Each black block indicates the quality score on this position The between the read and the reference sequence segment higher the block is the higher the quality score of this position is Note that the direction of the alignment shown in the read information tab is the same as the direction of the read sequence in the read files If a read 1s mapped to the reverse chain of the reference sequence the reference segment 1s reve
11. 5olexa_single_end_test more UNIQUE i Solexa_single_end_test more ALL 91 The merged mapping results will be shown in the mapping results window Make sure to select the Results nodes rather than the tasks The UNIQUE Results node and the ALL Results node of one job cannot be selected together We suggest not select the UNIQUE Results node and the ALL Results node together even they are from different jobs because showing both uniquely mapped results and top N mapped results might mess up what you really want 92 93 6 SNP and Small InDels Caller 6 1 Find SNPs and small InDel Candidates ZOOM builds consensus sequences according to the mapped reads along the reference sequence If the organism is haploid there is only one type of nucleotide on each position of the genome Thus all other nucleotide types of the reads covering this position are caused by sequencing errors or mapping error ZOOM therefore chooses the majority nucleotide letters of the reads covering this position as the consensus sequence If the organism 15 diploid the nucleotides on the positive chain and the reverse chain could be different ZOOM adopts a method similar to to compute the post probability of each possible genotype and choose the genotype with maximum probability as the consensus sequence The genotype is coded by the IUPAC code The mapping relationship of the IUPAC code and the genotype 15 as follows R lt
12. miro ache under the about BSI wehsipe EULA ee Registration Instructions Internet Connection 1 Select Request a license file has Internet connection and click Next 2 The following window will appear fuu ERE E3 If you have purchased ZOOM and have a registration key select Registration Key Enter your registration key as well as your name and email address and click Next Important You will receive your license file via email Registration Key Request 30 days evaluation license registration key required institutions If you are trying demo of ZOOM and do not have a registration key select Request a 30 days evaluation license No registration key required Enter your name email address as well as your institution Click Next 3 The following window will appear license wizard registration request sent An automated BSI service will generate the license file The software registration request has been sentto BSI successfully Please check your email to get the license file and use the Import License function license lcs and email it to the provided email account from 9 Some email servers may treat the license email as spam So please do not the License Wizard You can either save the attachment to a forget to check your junk email box local directory or copy the content between gt and lt the em
13. gt lt gt Y _ lt gt lt gt ___ M 1 lt gt lt gt ___ ___ __ __ gt lt gt __ 2285 GCeo GG lt gt lt gt ____ Mapping short DNA sequencing reads and calling variants using mapping quality scores Li H Ruan J Durbin Genome Res 2008 Nov 18 11 1851 8 94 ZOOM identifies the differences including mismatches and insertions deletions between the consensus sequence and the reference sequence as a primal SNP and InDel candidates set Note that this version of ZOOM can only find SNPs and short insertions deletions which occur on read sequences There are two factors which can affect the confidence of the SNP InDel candidates 1 the read number covering the position More reads covering this position means that the position is more likely to be a true variation However if the read depth 15 too high 1 might be due to the mapping results of repeated sequences Thus you can set both the minimal and the maximal read depths 2 the quality score of a base on a read reflects the probability of whether a base is sequencing error or not The quality score on the position or the quality scores of the bases around the position affects the probability of whether the difference on the position 1s a true SNP or not According to the above listed factors ZOOM lists the following five filtering criteria to filter out possible SNPs The r
14. six parts Mapping results illustrating window This window will show the reads mapped to the whole reference sequence or specific region of the reference sequence in different scales subject to your preference After you select a result 80 node and click the als toolbar icon the overview of the read depths of all reads mapped to the whole reference sequence will appear as follows x Mapping Result ABI mate pair test UNIQUE 200 50 2000 4000 5000 5000 10000 At the bottom of the window 1s a horizontal ruler denoting the positions on the reference sequence The left vertical ruler denotes the read depth You can get an idea about the coverage at different position of the reference sequence using the read depth line There are several operations allowed on the mapping results illustrating window Resting the cursor over a position iN mapping Result Solexe_single_end_test UNIQUE the mapping results illustrating z window will bring up a yellow tooltip showing the offset of this position on the reference sequence and the rough coverage of this position 81 Scale on the region you are interested 2 2 Result Solexa_single_end_test UNIQUE in by clicking the left button of the it into a rectangle then releasing the left mouse dragging mouse button X Mapping Result Solexa single end test UNIQUE
15. 1 Press the in the Sample_Data Solexa single_end directory button and choose the reference sequence reference fa Reference sequence FoomStudio 1 3 Sample_Data Solexa single_end peference The sequences in the reference files should be in FASTA format Multiple reference files or a Remove list directory can be loaded in Use button to remove files if needed 2 Click the Next button on the bottom of the window to continue 27 Mapping parameters Please use the following default parameters Create new job Organism Steps The organism is diploid 1 Basic information Pair end Settings 2 Input reads C Input reads are paired end reads mate pair reads Ha m The distance between two locations of paired reads is from 3 Reference sequences Read Qualities FASTO format will be regarded as Sanger type Don t count mismatches base with quality score less than count all bases Ignore reads with less than 8 high quality bases 4 Mapping parameters Mapping Criteria 7 check this to consider read length into mapping criteria Allow at most 2 mismatched base pair s 7 plus an indel of length at most base pair s Allow at most 0 edit distance s Achieve high sensitivity more mapping results but lower mapping speed Collecting Results For each
16. 556783 556784 556776 556777 556778 556779 556780 556781 556782 556783 556784 50 50 50 50 50 38 38 38 38 The description of WIG format is on the website of UCSC http genome ucsc edu goldenPath help wiggle html 111 7 2 Export Assembled Consensus Sequence Select a UNIQUE Results node of a job or several UNIQUE Results nodes of jobs with the same reference sequences Select Export from the File menu Select Assembled Sequences from the popup menu The assembled consensus sequence built according to the mapping result will be exported in FASTA format There are two ways to export the assembled consensus sequence o Consensus sequences o Consensus segments The difference 15 that if you choose Consensus sequences one reference sequence will export one assembled consensus sequence Those bases with no reads covering will be denoted by If you choose Consensus segments the consensus sequence of one reference sequence may be exported in several segments separated by the gap regions where no reads cover Several jobs with the same reference sequence can be selected together to output one consensus sequence Note that we suggest only building consensus sequences on the UNIQUE result nodes Because the result node contain top N mapping results for each read those reads mapped to multiple positions of the reference sequence will make
17. 1 A Processing window will pop up The reads in the read files will be loaded After this process is finished a job is created inside the directory you assigned All the information about this job is stored in this directory You can copy this directory anywhere If you use ZOOM to load this directory the job can be shown and post analysis can be carried out on it You can inspect the status of this job in the Job view window 4 5 Parameters There are five groups of parameters that you can select from as prefered Organism Organism The organism is diploid This 1s a checkbox to decide whether the organism 15 diploid When the option is selected ZOOM will assemble the consensus sequence as a diploid genome Bases in the consensus sequence will be presented using the IUPAC code 65 Pair end Settings Check the box when you want to align reads in paired end mode Pair end Settings Input reads paired end reads mate pair reads If you check the box and there are no read files added in paired file mode ZOOM will treat every two sequences in the file added in single file mode as the two mates of a pair When selecting this box please remember to assign the distance range between the two reads that make up the read pairs as follows The distance between two locations of paired reads is from 800 bases to 2000 bases Read Qualities Quality score of read reflects the sequencing quality of each b
18. 10 Click on the Locating gt 11 E e 84 reference sequence remember current position read information The 2513 2500 you may see different numbers is the offsets of current showing range in Mapping Results Illustrating window the reference sequence Click on remember current position and click the Locating bar again You will see 36 x 0 2513 2590 15 recorded here and by selecting this entry you can go back to this region at any 0 2513 2590 current position Enter a new position or a position range in the Locating such as 1234 or 1234 4560 Then read alignments in the new region will be shown in the Mapping Results Illustrating window 11 Enter a single position such as 1234 in the Locating bar or click a column in the Mapping Results Illustrating window A light blue bar will highlight this position as follows Mapping Result PH singles end d testfUNIQUET TATAGTGACCGAACCTATCAAAACC TTATAGTGACCGAACCTATCAAAACC TTATAGTGACCGAACCTATCAAAACC GATTATAGTGACCGAACCTATCAAAACC AATGATTATAGTGACCGAACCTATCAAAACC CAAGACTCTGTCT AATGATTATAGTGACCGAACCTA CAAGAC AATGATAATAGTGACCGAACCTATCAAAACC CA TGAATGATTATAGTGACCGAACCTATCAAAACA CA TGAATGATTATAGTGACCGAACCTATCAAAACC AATG AT GAACTGAATGATTATAGTGACCGAACCTATCCAACC AATTCCTTGAACTGAATGATTA
19. 85 86 ELL Emme 003 093 03 002 03 iur Lur E93 Ee e en e en en em en qT lh m T Tp DS C CSV em IP pum pum pum pum um pum pum um pum pum ur bri rur urb ur E E cm m e 603 riu Parii rur 2 em e eo e en en Lur en en wu uu uu pum qu uum u qum quum qum qum qum T De n xm ccc RR qT C CS CJ PS mee C03 003 003 002 003 92 002 102 IPSI IPLE 00 p 002 0 rem re PST n3 003 002 00 603 003 60 002 wiri lur dur Pur bur br dr Pur und ur em 002 002 603 02 002 002 02 002 C002 C01 002 C03 c ET
20. as follows x Mapping Result Solexa single end test UNIQUE s 40 20 3000 3500 4000 4500 Rest the cursor on a position of the peaks for a second The average read depth of this position will be shown in a tooltip box besides the mouse x Mapping Result Solexa single end test UNIQUE offset 3088 estimated average coverage 54 20 3000 3200 4000 4500 33 Click on a place in the Mapping Results Displaying window The detailed alignments of the mapped reads along the reference sequence will be shown as follows x Mapping Result Solexa single end test UNIQUE Difference between this read and the reference sequence TRIGTGAGAMCACECAAGAATGATCAATTAAAAA GATCAATTA AG TCTGTGAGAAACACCCAAGAA TCTGTGAGARACACCCAAGAA TCTGTGAGAAACACCCAAGAA TGAT TCTGTGAGAAACACCCAAGA TCTGTGAGAA TCTGTGAGAA ITCTGTG GCTGTGAGAAACACCCAAGAATGA TCAAT Read COCAG sequence CGCAG GGCCGGGCGCAG GOGCCOOGGCGCAG CGECCGGGCGCAG TGGTAATTTAAAAAATTACCGGCCGGGCGCCG ATGGTAATTTAAAAAATTACCGGCCGGGCUGCAG AAAAATGGTAATTTAAAAAATTACCGGCCGGGCGCA AAAAATGGTAATTTAAAAAATTACCGGCAGGGCGCO AAAAATGGTAATTTAAAAAATTACCGGCCGGGCGCA AAAAARTGGTAATTTAAAAAATTACCGGCCOGGGCGCA AAAAAATGGTAATTTAAAAAATTACCGGCCGGGCGC T TCTGTGAGAAACACCCAAGAA TGAT AAAAAAAAAATGGTAATTTAAAAAATTACCGGCCGOG ITCTGTGAGAAACACCCAACA AAAAAAAAAAAATGGTAATTTAAAAAA TTACCOGGCC AAAAAAAAAAAAATGGTAATTTAAAAAATTACCGGC
21. 0 8 2 1 PACKAGE CONTEN REI peu M EU RD uade 8 22 SYSTEM REQUIREMENT bate ON oes baec 8 28 INSTROMEN TATION on dudas Sp 9 2 4 TINS TAT ZOOL STUDI O eds 9 2 5 REGISTERING ZOOM dem o ees 10 Registration Instructions Internet COnnechion aot 10 Keoistration Instructions Internet Connection 11 UC LIONS ONERE NN 14 2 6 SET UP YOUR WORKING EN VIRONIVIEN Tx 14 TRewltent Side Computer 16 C nfguration of the seroer side Computer 17 DCL JOO RR MUN D E DU NE 18 2 COMMAND LINE USAGE OF ZG ure ped ped o uu aet odia ee uos 19 3 QUICK START TO USE 7 21 94 c WIEDER 21 3 2 TIE MAIN WINDOWS OF ZOOM 21 3 3 SET UP YOUR WORKING ENVIRONMENT sssssessssnnsccccecceceeeeesssssse
22. 5cheduling play map t TASK 2009 08 25 15 50 59 547 Ei Results es gt Solexa_single_end_test UNIQUE ZOOM will assemble the mapped reads into a consensus sequence and show the read depth overview along the reference sequence This will take some time depending on the amount of mapped reads and the length of the reference sequence The progress bar will pop up to display the progress 79 2 loading data By default ZOOM will assemble the mapped reads into a consensus sequence by treating the organism as a diploid genome After the procedure is finished you can see a tab window containing the mapping results on the right hand side of the main window of ZOOM as follows 74 ZOOM next generation sequencing Lo File Control Tool Help m EX SNP WE IN Solexa single end test X Mapping Result Solexa single end test UNIQUE Scheduling Results eng d IQUE Mapping results illustrating window Scaling tools SE EEE Bee reference sequence reference selecting bar offset bar BERBER RBBB RRP RRR eee eee eee eee N T T N 2000 4008 8000 8000 10000 pu Rung r Ne DOU read Wafonaate her Sar ae Detailed information panel The tab window contains
23. License Subject to the terms and conditions of this Agreement Bioinformatics Solutions BSI grants to you Licensee a non exclusive perpetual non transferable personal license to install execute and use one copy of ZOOM Software on one single CPU at any one time Licensee may use the Software for its internal business purposes only 2 Ownership The Software is a proprietary product of BSI and is protected by copyright laws and international copyright treaties as well as other intellectual property laws and treaties BSI shall at all times own all right title and interest in and to the Software including all intellectual property rights therein You shall not remove any copyright notice or other proprietary or restrictive notice or legend contained or included in the Software and you shall reproduce and copy all such information on all copies made hereunder including such copies as may be necessary for archival or backup purposes 3 Restrictions Licensee may not use reproduce transmit modify adapt or translate the Software in whole or in part to others except as otherwise permitted by this Agreement Licensee may not reverse engineer decompile disassemble or create derivative works based on the Software Licensee may not use the Software in any manner whatsoever with the result that access to the Software may be obtained through the Internet including without limitation any web page Licensee may not rent lease license tran
24. j E Boe Allow at most 2 edit distance s If you want to allow indels AND you want to assign the length of the indel check the radio box and the check box and assign the number you want as follows This will allow up to two mismatches and one insertion deletion of length one between V plus an indel of length at most 1 base pair s reads and the reference sequence allow at most 2 mismatched base pair s You can choose to assign a ratio of mismatches to read length instead of the mismatch number by checking the following checking box Assign the ratio of mismatches of the read length 68 uncheck this to use base pair numbers as mapping criteria allow at most 5 percent bases as mismatches Commonly the ratio criterion is useful for those read files containing reads of various lengths Getting High Sensitivity Results ZOOM adopts the optimal multiple seeds strategy to guarantee 100 sensitivity for a wide range of read lengths and mismatch numbers However using these seeds might be time consuming especially for very long reference sequences By default ZOOM adopts the seeds guaranteeing 100 sensitivity to find all mapping positions having up to 2 mismatches with the reads To get high sensitivity please click the following check box Then ZOOM will select the seeds with high sensitivity according to the mismatch number you assigned For cases where ZOOM can achieve 100 sensitivity p
25. window Mapping parameters There are five groups of parameters in the following Mapping parameters window The five groups of parameters will be explained in next section After choosing the proper parameters please press the 4 Fnsh Finish button to create the job 64 7 Create a new job m Organism Steps The organism is diploid 1 Basic information Pair end Settings 2 Input reads C Input reads are paired end reads mate pair reads 1 L 1 The distance between two locations of paired reads is from 100 bases to 200 bases 3 Reference sequences Im Read Qualities FASTQ format will be regarded as Sanger type Don t count mismatches on base with quality score ES than count all bases Ignore reads with less than 8 high quality bases Mapping Criteria check this to consider read length into mapping criteria 4 Mapping parameters Allow at most 2 mismatched base pair s plus an indel of length at most 0 pair s Allow at most 0 edit distance s Achieve high sensitivity more mapping results but lower mapping speed Collecting Results For each read report 9 the unique best mapping position top 0 best mapping position rform an additional rescore step to re evaluate multiple mappings with extra 5 mapping records for each read assuming that the SNP probability is 0 01
26. 2 itemRgb On useScore The mapping results will be shown line by line which are described by nine BED fields in each line of the file with the tab delimited 1 chrom The name of the chromosomes e g chr3 chrY chr2_random which is the names described in the reference sequence files Thus if you want the BED file be shown correctly in the UCSC genome browser please make sure that the reference names in the reference sequence files are accepted in the UCSC genome browser 2 chromStart The starting position of the alignment in this reference sequence The first base in a chromosome is numbered as 0 3 chromEnd The ending position of the alignment in this reference sequence The chromEnd base is not included in the display of the alignment For example an alignment defined as chromStart 0 chromEnd 50 spans the bases numbered 0 49 4 read name The of the read 5 score A score between and 1000 We use this item to store the edit distances between the read and the reference sequence 1 e the addition of the mismatch number and the length of the insertion deletion 6 strand The mapping direction of the read means the read is mapped to the positive chain of the reference sequence while means the read is mapped to the reverse chain 7 same as 2 8 same as 3 itemRgb RGB value of the form eg 255 0 0 If the track line itemRgb attribute 18 set to this
27. C PSU pum qum SSS pum SSeS m T e cum cm cm cmm cm um um qum um pum pm qm m m T dur m ee m um quum pum qum qum qum qum T ee ll ree lle eee SSS SSS SSS SSS SSS qum qm m qm m T SS SSS pum pum pum comlec 1 0 0 0 0 0 0 0 0 0 0 0 12 102 002 002 002 002 i3nn2n10131n131121133n0123n GGACTCCGGGAACCA T TGCAC TGCGCCCAGCCAGACAG GGCGGT TGAGCCGACAA TAGCGCCGACCATATACGACGGAA GGAC TCCGGGAACCATTGCACTGCGCCCAGCCAGACAGCGGCGG TTGAGCCGACAA TAGCGCCGACCATATACGACGGAZ 0 0 0 0 0 0 0 0 01 1 03 03 03 03 03 03 03 03 03 03 03 03 203 203 203 203 CS OS C4 DSL E CS D CS DS DS CJ PA AOS qum qum uu u mrmr DS
28. Check the box and modify the threshold in the following text field don t count mismatches on base with quality score less than 4 ZOOM will neglect those low quality bases when mapping Question How many reads can ZOOM deal with in 8G RAM Answer For command line version we suggest 25 30 million reads for 8G RAM If you double your RAM doubles you can also double the data size For GUI version ZOOM can split reads into small pieces You can modify the size of the small pieces to run on different size of RAM Please refer to Section 2 6 120 Question Can ZOOM schedule multiple jobs multiple CPUs of one server or multiple servers Answer Yes Please configure the server address using the Configuration button 42 on toolbar and ZOOM will split the job into several tasks schedule among these servers and collect the mapping results automatically You can also choose the data size of each task running on each CPU according to the RAM of your server Question Can ZOOM utilize the quality score of reads to enhance mapping results Answer Yes For Illumina Solexa data ZOOM adopts two ways to utilize quality scores to enhance mapping results The first way 1s to only count mismatches occurring on high quality positions The second is to utilize quality score of each base of the read to compute the mapping probability of possible alignments for each read and choose the best or top N mapping results according to the mapping p
29. N mapping results will be kept and output Totally 2432239 reads are mapped to the reference sequences 90 Al Mapping Results Totally 2432239 reads are mapped to the reference sequences The detailed statistics of mapping is as following Read Number 3 amp 8 8 8 eH 7 L4 MNA A Es gt O o MEMBR 89m 00 3 gt 399 3 2 o O oo The statistic table contains four columns Each row will show how many mapping positions the third column are mapped to the reference sequence with x mismatches the first column and one Insertion deletion of length y the second column and what the ratio of these mapping results over all mapping results the fourth column 5 3 Show Mapping Results Together If two or more jobs have the same reference sequence you can choose to merge the mapping results of these jobs to show the mapping results together ZOOM next generation sequencing Press Ctrl keyboard and click the Results nodes you want to show File Control Tool Hel together in the Job View Panel then m noo E 5olexa single end test Scheduling click the al toolbar icon So0lexa_single_end_test UNIQUE B 5olexa single end test more E Scheduling E Results
30. RBG value will determine the display color of the data contained in this BED line If item 6 is the item is 255 0 0 If item 6 is item is 0 0 255 In this way the read mapped to the positive chain will be shown in the color red The read mapped to the reverse chain will be shown in the color blue Here is an example of a BED file containing mapping results on two chromosomes 108 name Reads Alignments chr7 visibility 2 itemRgb On 127471196 127472363 127473530 127474697 127475864 127477031 127478198 127479365 127480532 name Reads Alignments visibility 2 itemRgb On 127472363 4 87 829 866 127473530 4 87 923 316 127474697 4 87 239 596 127475864 4 87 199 751 127477031 4 87 345 944 127478198 4 87 863 562 127479365 4 87 810 633 127480532 4 87 647 665 0 127481699 4 87 872 656 0 chrY http genome ucsc edu FAQ FA Qformatfformat GFF format GFF General Feature Format lines are based on the GFF standard file format GFF lines have nine required fields that must be tab separated If the fields are separated by spaces instead of description Reads 127471196 127472363 127473530 127474697 127475864 127477031 127478198 127479365 127472363 127473530 127474697 127475864 127477031 127478198 127479365 127480532 alignments show 255 0 0 255 0 0 255 0 0 255 0 0 0 0 255 0 0 255 0 0 255 255 0 0 127480
31. a space Reference name the name of the reference sequence which this read 1s mapped to If there 1s tab in the reference name ZOOM will transfer the tab into a space Reference offset the position that the read mapped on this reference sequence starting from zero By default the leftmost position 15 always returned no matter whether the read is mapped to the positive or negative strand Strand the strand of the reference sequence that the read is mapped to A means the read is mapped to the positive strand of the reference sequence A means the read 1s mapped to the negative strand of the reference sequence Total error number Total error number is the summation of the number of mismatches due to genomic differences and sequencing errors of the read ZOOM will decode the color space read 105 into nucleotides in order to separate genomic differences from sequencing errors The number of the two types of errors will be denoted in the lt Decoded nucleotide sequence gt and lt Mark of positions of sequencing error gt fields respectively Insertion deletion information the information relating to the insertion deletion between the read and the target region of the reference sequence on this offset The field will be the following three cases o No insertion deletion Only mismatches found o I lt length gt _ lt offset gt There is one insertion of length length behind lt offset gt o D lt length
32. data of the read and the third column contains the quality scores for each base Note that the read sequence label or description should not have a space inside 42 ABI SOLID color space data Applied Biosystems SOLiD csfasta File ABI 5011 represents their reads color spaces Any two adjacent nucleotide bases from read are represented by one of four colors The mapping relationship between base space nucleotide and color space 15 denoted in the definitions section under color In this release ZOOM accepts the color space data csfasta from SOLID in which each read is a numeric string prefixed by a single base The base that precedes the numeric color code data 1s the final base of the sequencing adapter Example of csfasta file format gt 1 6 678 0030011000002120322220223 gt 1 6 1142 T1011010321313123321022222 1 6 1616 F3 12220012213121322223113320 Applied Biosystems SOLID csfasta _OV qual File ABI SOLID data stores color space sequence of reads in csfasta file and corresponding quality score of each read in _QV qual file Note that ZOOM load the quality score file along with the csfasta file automatically So please put the csfasta file and the _QV qual file together in one directory and set the prefix file name the same Example of _QV qual file of the above csfasta 57 1 6 678 10 8 24 10 14 8 11 105 85 195723923527445 1 6 1142 F
33. depth in ascending order Click it again to sort in descending order Similarly each field in the SNP table can be sorted read information Mapping Results Summary SNP Caller Summary Export all SNPS ref offset ref base consensus ReadDepth best base bestBaseC 2thbestb 2th Bests 9 Click the Exportall SNPs button to export the SNP candidates into a file All SNP candidates will be exported in a format of the nine fields as each line in the SNP table 43 3 8 Export data into files The mapping results and consensus sequence can be exported to files Note that only results nodes can be exported 1 Select the Solexa_single_test UNIQUE result node 2 Select Export from the File menu Select Mapping Results from the popup menu There are four output formats to export mapping results into Please refer to Section 7 1 in Chapter 7 in this manual for the description of each format Contral Tool Help is New Job Ctrl N m Load Job Removelob Export Mapping Results ZOOM Consensus Sequences BED WIG GFF Exit Ctrl Q 1 Control Tool Help 3 Select Export from the File menu Select Consensus Sequences from the popup menu sue consensus Sequence out NewJob CtieN 2 e according to the map
34. different platforms Setup job batch size ZOOM will split large reads data into several small data files According to the number of CPUs you assigned these small data will automatically be scheduled to run in parallel on multiple CPUs To fit the multiple small data files in the RAM of server you d better modify the size of the split files according to the RAM per CPU can use For example if you have a server with 8G RAM and you have set MAX_CLIENT 4 four tasks can be run in parallel then the RAM 18 each CPU can use is 8G 4 2G The default data size is 4 million reads per small file which 18 good for 2G RAM per CPU If the RAM per CPU on your server is smaller or larger choose System Configuration and decrease or increase the batch file size Split my job into tasks each has most 4000000 reads We use the RAM per CPU rather than the total RAM of the server as the criteria to decide the amount of reads of each task 15 because different servers might have different architectures For example some architecture is multiple CPUs sharing the same RAM while others are multiple CPUs which have their own RAMs 27 Command line usage of ZOOM Starting from version 1 3 0 computational tasks are carried out and completed between ZOOM GUI and ZOOM server cooperatively but the users can still use the ZOOM server as a command line tool 19 20 Quick Start 3 Quick Start to Use ZOOM his section of the manual
35. e uu pum ll pum pum pum qum qum SS m T SS SSS um pum qum CJ p CO 1 1 1 1 1 1 1 1 1 33132130 _ 324 410332 3330 32 41013333432432 3333132130 3 3 3 3 3 3 j a 32 3 2 2 3 2 3 2 2 3 2 0 2 2 a 2 2 2 2 2 3 2 1 2 1 2 2 2 2 3 2 3 2 2 2 3 233303 33233303 meme T a iuiu iur Lur Pur Dur Lun Lr Dur P CY CJ P CC MS CJ OS C089 PSU PS q Per ore T me D hh ge 003 iur 002 003 Lur iur ee pem n2 we Par ur ur Par ur ur Pur n Lr ur meme C02
36. gt _ lt offset gt There is one deletion of length length starting from offset Note that the offset 1s the offset on the original read sequence starting from zero no matter whether the read is mapped to the positive chain or the complement chain of the reference sequence Decoded nucleotide sequence decoded nucleotide sequence of the read after error correction Genomic differences will be highlighted by lowercase letters Notice that the first position of the color space read is coded by the first base of the read and the last base of the adapter ZOOM doesn t include the last adapter base at the beginning of the decoded sequence Mark of positions of sequencing errors This is a binary string which marks the positions of sequencing errors by and the positions without sequencing errors by 0 An example of lt output gt file 8278 chri 1 M ATACGGTTTGAGAATGTAGTICAAACGCCTCCASTT 00000000000000000000000000000010000 14743 chri 28 2 M CGGTAATACGTGACTCCTGATACGGTTTGAGARTG 00000000000000000000000001 00000000 Tete chi 32 33 TCTCAAACCGTATCAGGAGTICACGTATTACCGGAG 00000000000000000000000000000000100 4063 chri 51 1 Di 33 TECACGTATTACCGGGATGCTGUR AAA ACTGGRADG 00000000000000000000000000001000000 Interpretation of the results o read 9278 1s mapped to the offset 10 of the negative strand of the reference sequence chrl with one error and no insertion deletions where there is one sequencing error and no genomic
37. is sorted according to the refId and the reference offset SNPs with larger read depth might be more reliable You can choose to sort the SNPs according to the read depth Click the name of the column and the rows of the table will be sorted according to the contents of this column in ascending order Click the name once more and the rows will be shown in descending order Mapping Summary SNP Cale bestBaseC 2thbestb 2thBestB Press window SNP summary 32 SNP candidates are found Filtering criteria At least 50 reads are mapped to this position read depth Discard the reads whose sum of base quality is less than 900 Average quality score on the SNP position of all reads covering this position shoud be larger than 30 99 The number of SNP candidates and the filtering criteria adopted will be shown 64 Export all SNPs Exportal SMPs SNPs can be exported to a file by pressing the button and choosing a directory and input the file name to store the SNPs Each line of the file contains the nine fields of one SNP delimited by lt refId gt ref offset ref base consensus base Read Depth best base bestBaseCount 2nd best base 2nd BestBaseCount refID the id of the reference sequence starting from zero ref offset the offset on the reference sequence where is the SNP located ref base the nucleotide of the reference sequence on the p
38. load the job to display results or post analysis 7 Create new job Job Label job 1 Steps Job Location F XZOOMDEB 1 Basic information Job Folder 1 2 Input reads Notes Description 3 Reference sequences Created on Tue Aug 25 10 09 18 EDT 2009 4 Mapping parameters 1 Enter name for your job in blank field beside the Job Label for example Solexa single end test 23 Job Label 5olexa single end test 2 Press the 4 Browse button to specify a directory to save your job For example I ehe Recent Items Desktop Documents Computer File Network Files of type All Files 3 You can enter any descriptions about your job for later reference Notes Description Created on Tue Aug 25 10 09 18 EDT 2009 This is a sample of mapping the single end reads data of Illumina Solexa 4 Click Next button on the bottom of the window to continue zz Back 24 Input reads All reads data are input here There are two ways to input reads files by selecting read files or directories ZOOM will automatically search for all the reads file inside Please note that the read file should be in a standard format of next generation sequencing technologies For example _seq txt qual txt fasta files for Illumina data QV qu
39. m DS DSL DS C DS CC PS 0 P108 ar rrr Pur dur ge 003 002 l 002 002 C01 ET e uu pum ll pum pum pum qum qum SS m T SS SSS um pum qum CJ p CO D T morc iuiu iur Lur Pur Dur Lun Lr Dur P CY CJ P CC MS CJ OS C089 PSU PS q Per T X me D hh ge 003 iur 002 003 Lur iur ee pem n2 we Par ur ur Par ur ur Pur n Lr ur meme C02
40. read report 9 the unique best mapping position top 0 best mapping position rm an additional rescore step to re evaluate multiple mappings with extra mapping records for each read assuming that the SNP probability is 0 01 The detailed descriptions of the parameters in Section 4 3 of Chapter 4 in this manual Click the 4 Fm button A new job will be created A directory named Sample single end test will be created All information about this job will be stored in this directory You can copy this directory anywhere If you use ZOOM to load in this directory the Job can be shown and post analysis can be carried out on it 3 5 Monitor the job After the job is created the job will be shown in the Job View panel in the left window of the interface For each job ZOOM will automatically create a task to map these reads the assigned server If the amount of reads 1s large ZOOM will automatically partition the reads into several parts and launch several tasks for each part of the reads ZOOM will schedule these tasks automatically until all reads are handled and the user can monitor the running status of the 28 jobs and the tasks according to the corresponding progress bars in the Running Monitor window Progress Bar Depending on the data size it may take some time to load the data The time 15 related to the data size of the reads data file A progress bar will pop up showing the progress of load
41. tabbed windows 73 Job properties Reads files Reference sequences C MZoomStudiol 3XSample Data SOLiD mate pair read F3 csfasta The properties include the reads file list reference sequence file list the parameters used and the description note of the job 4 8 Control jobs and tasks There are several operations on the jobs or tasks which will change the status of the job or task Rerun A job canceled or having errors can be rerun by selecting the job and clicking the toolbar icon Rerun D After the job is rerun the job node icon will turn into Cancel If a job 1s still running you can cancel it by selecting the job and clicking the toolbar icon cancel D The job will stop be canceled The icon will turn into Note that only running jobs can be canceled Remove A job can be removed at any time You can choose to remove the job from the workspace or delete all the information about the job from the computer RAM and harddrive Select the job you want to remove and click the remove toolbar icon A confirming dialog will pop up as follows 74 confirm remove ABI mate pair test from workspace 7 also permanently delete their files from my disk Press the OK button if you want to remove the job from the work space which will result in the removal this job from the Job View Panel and the computer memory However you can load in the job once again at any time If
42. the folder directory in which you would like to install ZOOM To change the default location press the Choose button to browse your system and make a selection or type a folder name in the textbox Please avoid installing ZOOM in the Program Files directory as well as in any directory for which the ZOOM user will not have write permissions Click Next 9 Choose where you would like to place icons for ZOOM Studio The default will put these icons in the programs section of your start menu A common user preference 15 on the desktop Click Next 9 Review the choices you have made You can click Previous if you would like to make any changes or click Next if those choices are correct 10 ZOOM Studio will now install on your system You may cancel at any time by pressing the Cancel button in the lower left corner 11 the installation is complete click Done The ZOOM Studio menu screen should still be open You may view movies and materials from here To access this menu at a future date simply insert the disc in your CD ROM drive 2 5 Registering ZOOM product information ZOOM debug 1 3 2724730311 Copyright 2008 2009 Binivformaotics Solutions Inc All rights reserved The first time ZOOM is run the About dialogue containing license wizard will appear automatically vee bore m Click launch license wizard button to register your E copy of ZOOM ausu cue
43. then A is reported If there are two mismatches for both A and B an indel of length one for A an indel of length two for B then A 1s reported If there are two mismatches and an indel of length one for both A and B then this read 1s not reported The depth of mapped reads Coverage The amount of mapped reads covering the position of the reference sequence is called the depth of mapped reads on the position or the coverage of the position on the reference sequence ZOOM Zillions Of Oligos Mapped a next generation sequencing analysis tool Installation 2 Getting started with ZOOM 21 Package contents The ZOOM package should contain This manual hardcopy and or electronic version ZOOM software 2 2 System requirements ZOOM will run on most platforms with the following requirements Processor Equivalent or superior processing power to an Pentium 4 Processor 1 6GHz Memory 2 GB memory 8 GB RAM is recommended for processing large data set Operation System Microsoft Windows XP or above and or 64bit Unix Linux operation system Display 800 pixels by 600 pixels minimal 2 3 Instrumentation ZOOM will work with both single end reads and paired end reads of length ranging from 12bp to 240bp from the following next generation sequencing instruments seq txt and _prb txt fastq Applied Biosystems Inc SOLID csfasta _QV qual fastq 2 4 Install ZOOM Studio Note If you alread
44. to the reference sequence selection Reference sequences Assign the reference sequences where the reads data are to be mapped The sequences in the reference file should be in FASTA format Multiple reference files or a directory can be loaded into ZOOM Add file dir s to list 1 Press button and choose multiple files or directories If directories are chosen all FASTA files in this directory will be loaded there 63 7 Create a new job 57 Reference sequence Matabase hg18 chr 10 fa t database hg18 chr11 fa t database hg18 chr 12 fa t database hg18 chr 13 fa t database 18 chr 14 fa t database hg18 chr 15 fa t database hg18 chr 16 fa t database hg18 chr 17 fa t database 18 chr 18 fa t database hg18 chr 19 fa t database hg18 chr2 fa t database hg18 chr20 t database hg18 chr L fa t database hg18 chr22 fa 18 chr22_h2_ hapi t database hg18 chr3 fa t database hg18 chr4 fa t database hg18 chr5 fa 18 chr5_h2_hap1 fa t database hg18 chr fa Matabaselhgi8Whr amp 6 cox hapi fa 184 1 6 gbl hap2 fa Matabaselhg18Wchr 7 fa Matabaselhg18Wchr8 fa t database hg18 chr9 fa t database hg18 chrM fa t database hg18 chrx fa t database hg18 chr fa t database hg18 hg 18 Ist Ti t m nem A E al 5 Renove From st 3 Click the parameters
45. up confirm remove Solexa single end test from workspace 7 also permanently delete their files from my disk Press The Solexa single end test job node will be removed from the Job View panel This operation will only remove the job node from the Job View panel You can click the open icon and select the directory where Solexa single end test is stored to load the job into your workspace again 2 Click on the Solexa single end test more job node and click 4 tool bar icon Click on the checkbox and press All the items related to the job including the directory on the disk will be deleted permanently confirm also permanently delete their files from my disk 3 12 Paired end Mate pair read mapping example We assume that you have gone through the above single end reads mapping process Now we will explain how to map paired end mate pair reads focusing only on the operations that are different from mapping single end reads 49 Click g to create a new job named ABI mate pair test as follows Fd Create a new job Job Label ABI mate pair test Job Location Job Folder F ZOOMDB ABI_mate_pair_test Notes Description Created on Wed Aug 26 17 37 48 EDT 2009 1 Basic information 2 Input reads 3 Reference sequences 4 Mapping parameters This is a sample of mate pair mapping of ABI 5OLiD data lt lt Back 2 Click t
46. variations 100 Sensitivity the full capacity to find all target regions within user defined mismatches on the reference sequence for each read SNP Single Nucleotide Polymorphism SNP is a DNA sequence variation occurring when a single nucleotide A T C in the genome or other shared sequence differs between members of a species or between paired chromosomes in an individual Single end reads reads that were sequenced separately Target region reference sequence segment where the read 15 mapped Uniquely mapped read Each read might be mapped to multiple target regions in the reference sequence The best mapping results of one read are the ones with smallest edit distance or in the case of an equal edit distance the shortest indel length under the assumption that indels are less probable than mutations If there is only one such best mapping result for the read this 1s a uniquely mapped read Otherwise if there are multiple such mappings the read will be considered ambiguously mapped For example let A and B be two reference positions If a read can be mapped to position A and position B on the reference genome with two mismatches for A and one mismatch for B then B is reported as the unique mapping position for this read If both A and B contain two mismatches then this read 1s not reported If there are two mismatches and an indel of length one for A one mismatch and an indel of length two for B
47. will copy the read name and the read sequence to the clipboard If the read is ABI 5011 data the color space read sequence will be copied to the clipboard too At the same time more information of this mapped read will be shown in the read information tab window below the Mapping Results Illustrating window read information Read name 1640 Reference offset 1261 Mapping direction Reverse EC RR read GCACTGCTTTTTCTGTAT CCCAGAGGTTTTGAT T AG Sequence quality NENENENEEEEERENEEEEENEEEENEEREENNE _ gt The red nucleotide is the difference Each black block is the hint of the quality score on this position higher the block is the larger the quality score of this LI a position is sequence segment Note that the direction of the alignment shown in the read information tab is the same as the direction of the read sequence in the read files If a read is mapped to the reverse chain of the reference sequence the left offset of the reference segment is larger than the right offset as in the above picture Copy the read sequence Click the copied to the clipboard of system gt button then the read name and the read sequence will be If the read data is mapped in Paired end Mate pair mode you can select a read and click the Find its mate pair button ZO
48. will walk you through most of the basic functionality of ZOOM After completing this section you will see how easy it is to map a huge amount of reads with automatic scheduling view mapping results and find SNPs and short Insertions Deletions on both single end reads and paired end reads for both Illumina Solexa instrument and ABI SOLiD instrument 3 1 Sample Data ZOOM provides two sets of sample data in Sample Data directory The Solexa directory contains an Illumina Solexa test data set and the SOLiD directory has ABI SOLID test data set l In the Solexa directory there are two directories single end directory read fastq and reference fa 22 66 paired end directory read l fastq read_2 fastq and reference fa 2 In the 5011 directory there are two directories single end directory read fastq and reference fa paired end directory read F3 csfasta read F3 QV and reference fa 3 2 The main windows of ZOOM The following picture shows the main windows of ZOOM 21 4 next generation sequencing File Control Tool Help 9 i ew n d C E job 3 X Mapping Result job 3 UNIQUE Scheduling Results a Jib 3 UNIQUE tcaggcttAAAAATAAACATAAGAATTAAAAAAGGACHdgaggctgaggcadggagaatc __ tcag c ttfAAAAATAAACATAAGAATTAAAAAAGGACUOgaUgCtgagdgcadgagaatcgct 4020740 4010760 40780 lt type job total tasks 1 rea
49. your inbox please check your junk mail folder 9 In the license wizard on click browse button below to select the license lcs file and click Next 4 license wizard paste the license content from the email Oooo i 9 Click Finish if you receive a message that the license has been imported successfully license wizard License import completed The license has been imported successfully 13 Re registration Instructions Re registering ZOOM may be necessary if your license has expired or if you wish to update the license You will need to obtain a new registration key from BSI Once you have obtained this new key select About from the Help menu The product information dialogue box will appear product information ZOOM debug 1 3 20090911 Copyright 2008 2009 Bioinformatics Solutions Inc All rights reserved License to N A please use lt license wizard gt to activate your product License key N A Registered email N A License Start N A License Expire N A Threads N A SPS expire N A Warning This computer program is protected by copyright law and international treaties Unauthorized reproduction or launch license rd distribution of this program or any portion of it may result in severe civil and criminal penalties and v ll be prosecuted to the terminate program maximum extend possible under the law about BSI website EULA acknowledge
50. 005 This a sample of mapping the single end reads data of Illumina Solexa 4 Click the gt button at the bottom of window 60 Input reads Create new job Steps read file quality file 1 Basic information 2 Input reads 3 Reference sequences 4 Mapping parameters Add read file dir s to list Remove from list Switch to paired files mode All reads data are input here There are two ways to input reads files by selecting read files or directories ZOOM will automatically search for all the reads files inside Please note that the read file should be in a standard format of next generation sequencing technologies For example _seq txt qual txt fastq fasta files for the Illumina data _QV qual fastq files for ABI SOLID data as described in Section Error Reference source not found The default Input reads window opened is for single end reads To change to the paired end mate pair reads window click Bes mote to the single end reads input mode button Click it again to switch back Input single end reads 61 Add read file dir s to list 1 Click button Choose the read file s or directories which contain the read files By default ZOOM will find all the files suffixed by _seq txt qual txt csfasta QV qual files to load
51. 3 8 11 8 17 14 8 25 20 15 14 16 17 11 10 19 16 25 15 5 16 13 19 10 6 12 1 6 1616 F3 21886 7341185 16 8 10 1163 14 1695 19 7 10 8 21 6 1634 Applied Biosystems SOLID fastaq File The contents in csfasta and can be integrated as a CSFASTO file Itis similar to the FASTQ format where four lines represent one read The only difference is that the read sequence is in color space format read_name read sequence The quality score of each position is denoted by a nucleotide The numerical score is ord q 33 Example of CSFASTQ format 0588015241 1 CLARA 20071207 2 CelmonAmp7797 16bit 26 88 34 length 50 T32322133300002330031001022230020232002203222030231 588015241 1 CLARA 20071207 2 CelmonAmp7797 16bit 26 88 34 length 50 121 42640 2575 amp amp 8 amp amp 5555 5 5 5 5 55 26767670 There are two sequences in the above two example Note that the first character of the quality score string will be viewed as the quality of the adapter nucleotide 58 4 3 Reference sequence file format ZOOM accepts reference sequence files containing reference sequences in FASTA format Example format of a reference sequence file gt Reference sequence 1 name AGGACTATATTGCTCTAATAAATTTGCGGTTCTTAAAAACTCAATGT TGTAAAAATGTCACTTCTTCCCAAA 4 4 Create a new Job edu gt gt To create a new project click on the Create a new job toolbar icon or select New J
52. 40 40 40 40 40 40 40 40 4040 40 40 40 prb txt 15 the same as the description in part 2 above ZOOM can process this file without corresponding _seq txt file as input The difference is that the labels of each read sequence are automatically assigned 55 FASTO Format ZOOM accepts FASTQ format where four lines represent one read in the following format read_name read sequence Example of FASTQ format 0071113 EAS56 0053 1 1 756 463 GTGATTAGTGAAACATAAAATAGTTTCATGTTGAAA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIAI 071113 556 0053 1 1 813 752 FASTQ format includes Sanger FASTQ format the Hlumina Solexa FASTQ format which scale differently For Sanger FASTQ format the quality score of each position equals ord q 33 while for Illumina Solexa FASTQ format the quality score of each position equals ord q 64 When creating a job ZOOM has a combo box in the parameter selection part to assign whether the FASTQ format is Sanger FASTQ format or the Illumina Solexa FASTQ format Read Qualities FASTO format will be regarded as San ger type Don t count mismatches on base than count all bases Ignore reads with less than 8 Illumina type ASES One read per line with quality scores Example format 4_87_872_656 ACGTACNT 40 40 40 40 35 60 70 56 The first column contains the read sequence label or description the second column contains the sequence
53. 532 127481699 0 0 255 description Reads For more information about the BED format please refer to the website of UCSC alignments show tabs the track will not display correctly For more information on GFF format refer to http www sanger ac uk Software formats GFF 1 seqname The name of the reference sequence Must be a chromosome or scaffold 2 source The data source This field 1s solexa or solid according to the data type 3 feature The name of the read 4 start The starting position of the alignment in the reference sequence The first base 15 numbered 1 5 end The ending position of the alignment in the reference sequence This end 1s included in the display of the alignment For example an alignment defined as start 1 end 50 spans the bases numbered 1 50 6 score A score between 0 and 1000 We use this item to store the edit distances between the read and the reference sequence 1 the addition of the mismatch number and the length of the insertion deletion ds strand The mapping direction of the read means the read is mapped to the positive chain of the reference sequence while means the read 1s mapped to the reverse chain 8 frame ZOOM set this field to Example track name Reads Alignment on DH10B description Reads alignments show DH10B solid 2852 R3 13 63 DH10B solid 4085 R3 13 63 DH10B solid 7489 R3 13 63 For information about
54. 8 28 11 36 42 56 i TASK 2009 08 28 11 36 42 73 i TASK 2009 08 28 11 36 42 73 gt TASK 2009 08 28 11 36 42 76 TASK 2009 08 28 11 36 42 76 C TASK 2009 08 28 11 36 42 76 TASK 2009 08 28 11 36 42 76i TASK 2009 08 28 11 36 42 771 TASK 2009 08 28 11 36 42 77 1 4 TASK 2009 08 28 11 36 42 77 7 TASK 2009 08 28 11 36 42 77 TASK 2009 08 28 11 36 42 77 nm Running monitor type job total tasks 12 overall progress 2496 Job Properties Subtask ITASK 2009 Subprogr Time In this image four tasks are currently running while other tasks are waiting on the list to be scheduled For all the jobs created or loaded you can view the status of the jobs and tasks using the following icons Running icon The job or task is running Canceled icon The job or task is canceled Error icon An error occurred when the job or task was running Waiting icon The task is waiting to be scheduled Finished icon The job or the task is finished When job is finished the Results node will appear You can choose the UNIQUE or ALL Results node to show the mapping results or carry out post analysis OO X O0 0 Job Running Monitor Panel When clicking a job or a task in the Job View Panel the running information of this job or task will be shown in th
55. AATATACCTAATGA AT AAAACCAGGA AAAAAAAAAAAAACAAAACTACAGGCCAA GCATAGATGCAAAAATC AAAACCAG GCATAGATGCATAAATC A GACATAACAAAAAAAAAACAAAACTACAGGCCAA TA AATGAGCATAGATGCAAAAATC AAAACCAGGAGAGGACATAACAAAAAAAA GGCAAATATACCTAATGAGCATAGATGCAAAAATC ACTACAGGCCAATATACCTAATGAGCATAGATGCAA AAAACCAGGAGAGGAC AAAAAAAAACAAAACTACAGGCCAATATACCTAATG AGATGCAAAAATC AAAACCAGGAGAGGA A AAAACCAGGAG AAAACCAGGA Scaling tools There are three scaling tools which can help you see the mapping results in different scales 11 83 11 Press this button for the display of the mapping results to go back to the overview of the read depths of all reads mapped to the WHOLE reference sequence QW Press this button to display mapping results at a zoomed in rate of 1 2x e Press this button to display the results zoomed out by a factor of 1 2x Reference sequence selecting bar chr 11 chr 16 You can click the dropdown list box to choose a desired reference sequence whose mapping results to be shown The reads mapped to the selected reference sequence will then be assembled along this reference sequence and displayed A progress bar will pop up to show the progress Reference offset bar The reference offset bar shows the offset range of the reference sequence shown in the current mapping results illustrating window 11 4 33 rem
56. AATCATTCTATGGAAC AATACCAAAACCAGGAGAGGACATAACAAAAAAA AATCCCCTCTAAATCATTCTATGAAAC ACCCTAATACCAAAACCAGGAGAGGACA TAACAACA AATCCCCTCTAAATCATTCTATGAAA ACACTAATACCAAAACCAGGAGAGGACA TAACAAAA AATCCCCTCTAAATCATTCTATG TATCACCCTAATACCAAAACCAGGAGAGGACATAAC AATCCCCTCTAAATCATTCTATGAAACCAGTATCA AAGCCCCTCTAAATCATTCTATGAAACCAGT AATCCCCTCTAAATCATTCTATGAAA AACAAAAC AGGA i RAAAAAAAAACAAAAC CTATGAAACCAG TATCACCC TAA TACCAAAACCAG AGATAAAAAAAAAAAAACAAAAC T gaatcccctctaaatcattctatdaaaccadtatcaccctaataccaaaaccagdadagjacataacaaaaaaaaaacaaaa Running monitor type task remote map total subtasks 3 overall progress 10096 gaa naa cattc 2 Gaaaccag 2 caccc iE 380 340 20 3830 840 850 3860 3 270 N21 reference sequence sample v 3798 3875 Job Properties Reference offset 3830 Mapping direction Reverse read information Read name 164 the read sequence Subtask Subpro Time upload data 00 01 52 map reads 00 00 02 get results 00 00 02 reference 3865 TGTTATGTCCTCTCCTGGTTTTGGTATTAGGGTGA read IGHTATGTCCTCTCCTGGTTTIGGTTTTAGGGTGT sequence quality ENEENEEEEENEEEENEEEEEEENENEN _ 14 ZOOM GUI relies on on
57. AGGAGTCACGTATTACCGGAG 0 000 0000 0000000000000000100 3 341086 at Se 1 1 Di 33 TCACGTATTACCOGGATGOCTOCAARAACTSOGAACG 0000000000000000000000000001000000 3 532018 1 1 Output for paired end reads data The output format is same as the output of single end reads described in the above two sections The only difference is that the reads in paired end reads data are mapped in pairs Each read of a pair is mapped to the reference sequence within the allowed mismatches or edit distances as is done for the single end read case The user needs to judge whether the pair is a correct mate pair has an insertion deletion or is a translocation according to the strand and the offset where they are mapped to BED format BED format provides a flexible way to define the data lines that are displayed in an annotation track of the UCSC browser The BED file is used to show the alignments between the reads and the reference sequences If there are several reference sequences each BED file may have several tracks Each track shows the read alignments in this reference sequence Each BED track has one annotation line on the heading of the file describing the features of this file and the configuration needed to show the results in UCSC genome browser You can revise the head of the file to get the display effect as you like 107 track name Reads Alignments on Chri description Reads alignments show visibility
58. CAGGCTGG GIAGGGACGGGCTTTCCCCGTGTTGTCCCGGCTGG GGTAGGGACGGGCTTTCCCCGTGTTGTCCCGGC TGG TGGTAGGGACGGGCTTTCCCCGTGTTGTCCCGGCTC TGGTAGGGACGGGC TTTCCCCGTGTTGTCCAGGC TG TGGTAGGGACGGGCTTTCCCCGTGTTGTCCCTGCTG TTGGTAGGGACGGGCTTTCCCCGTGTCGTCCCGGC T TTGGTAGGGACGGGCTTTCCCCGTGTTGTCCATGCT TTGGTAGGGACGGGCTTTCCCCGTGTTGTCCCGTCT TTGGTAGGGACGGGCTTTCCCCGTGTTGTCCATGCT ITTGGTAGGGACGGGCTTTCCCCGTGTTGTCCACGT ITTGGTAGGGACGGGCTTTCCCCGOGTTGTCCATGC ITTTGGTAGGGACGGGCTTTCCCCGTGTTGTCCAGC TG TGGG TAGGGACGGGC TT TCCCCGTGTTGTCCAGG ITITIGGTAGGGACGGGCTTTCCCCGTGTTTTCCOG GTTGTGGTAGGGACGGGCTTTCCCCGTGTTGTCCAG CGCCTGGCTACATTTTTTTTTTTTTTTTTTTTTTIT ATITTITGGTAGGGACGGGCTTTCCCCGTGTTTTCCT G CGCCTGGCTACATTTTTTTTTTTTTTTTTTTTTTIT ATTTITTGGTAGGGACGGGCTTTCCCCGTGTTGTCCA G TCGCCTGGCTACATTTTTTTTTTTTTTTTTTTTTTIT TATTTTTGGTAGGGACGGGCTTTCCCCGTGTTGTCC GG CTCGCCTGGCTACATTTTTTTTTTTTTTTTTTTTIT GIATTTTTGGIAGGGACGGGCTTTCCCCGTGTTGTT TGG CTCGCCTGGCTACATTTTTTTTTTTTTTTTTTTTTT GIATTTTTGGTAGGGACGGGCTTTCCCCGTGCTGTC TGG CTCGCCTGGCTACATTTTTTTTTTTTTTTTTTTIT TGTATTITTCGGTAGGGACGGGCTTTCCCCGTGTTGT CTGG CTCGCCTGGCTACATTTTTTTTTTTTTTTTTTTTT TGTTTTTTTGGTAGGGACGGGCTTTCCCCGTGTTGT CTGT CTCGCCTGGCTACATTTTTTTTTTTTTTTTTTTIT TTGTATTTTTGGTAGGGACGGGCTTTCCCCGTCTTT GC TGG CTCGCCTGGCTACATTTTTTTTTTTTTTTTTT TTTTG TTTGGTAGGGACGGGCTTTCCCCGTTT AGGCT CTCGCCTGGCTACATTTTTTTTTTTTTTTIT TTTTTGTATTTTTGGTAGGGACGGGCTTTCCCCOGT CAGGC TGG CTCGCCTGGCTACATTTTTTTTTTTTTTTIT TTTTTGTATTTTTGGTAGGGACGGGCTTTCCCCGTOG CAGGCTGG CTCGCCTGGCTACATTTTTTTT
59. CGACCA TATACGACGGA Rn2122n3nn2n14n13n131121433nn123n 1 gt 23332 1140332333 32 1 13333132132 __ TCCGGGAACCATTGCAC TGCGCCCAGCCAGACAGMGGCGGT TGAGCCGACAA TAGCGCCGACCATA TACGACGGAZ 3021220300201013012 Dx 30 Click or drag horizontal scrollbar will let you navigate along reference sequence 8 Click or drag the vertical scrollbar on the right to show more reads aligned to this region when all the reads mapped here cannot fit in the Mapping Results Window 9 x Mapping Result Solexa single test UNIQUE AAAATC AGCATAGATGCAAAAA AAAAGTACAGGCCAATATACCTAATGAGCATAGATG AAAAACTACAGGCCAATATACCTAATGAGCATAGAT CAAAACTACAGGCCAATATACCTAATGAGCATAGAT AATC AACAAAACTACAGGCCAATATACCTAATGAGCATAG AAAATCO AACAAAACTACAGGCCAATATACCTAATGAGCATAG AAAAAGCAAAGCTACAGGCCAATATACCTAATGAGC AAAAAAAAAAAACTGCAGGCCAATATACCTAATGAG AAAACTACAGTCCAATATACCTAATGAG GCAAAAA AAAAAAAAACAAAACTACAGGCCAA GCI AAAACTACAGGCCAATATACCTAATG TCi ACCT ATAGATGCAAAAATC AACAAAAAAAAAACAAAACTACAGGCCAATATACCT GAATAGATGCAAAAATCI AAAACCAGGAGAGGACATAACAAAAAAAAAAARAA AATGAGCATAGATGCAAAARA TCE AAAACCAGGAGAGGACATAACAAAAAAAAAACAA AATGAGCATAGATGCAAAAATCI AAAACCA
60. CTCTAGCCT CAGAGCAAGACTCTGTCTCAAAAAAAAAAAAAAAAA CTTGAACTGAATGATT posiuon TGCACTCTAGCT GACAGAGCAAGACTCTGTCTCCAAAAAAAAAAAAAA CTTGAACTGAATGA T T Y heck th TGCACTCTA GTGAAAGAGCAAGACTCTGTCTCCAAAAAAAAAAAA AATTACTTGAACTGAATGATT ou can cnec _ GGTGACAGAGCAAGACTCTGTCTCCAAAAAAAAAAA AAATTICCTTGAACTGAATGATT 7 li t udgtgacagagcaagactct tctcMaaaaaaaaaaaaaaaaaaadttcctt aactuaatgattate alignment around DESEN Senn t this position in 11280 1190 1200 1210 1220 1230 1240 12857 detail You 11 e 84 sequence 1172 1249 double click each row in the table to see SNP 2 Export all SNPs candidate details ref offset ref base consensu Read Depth best base bestBase 2thbestb 2th BestB 0 11 1 3 42 Click one read in this position and click the read information tab You can check quality of the position of this read to know whether the SNP candidate is more likely a true SNP or a Sequencing error W ERO OE VPs GAGCAAGACTCTGTCTCHAAAAAAAAAAAAAAAAAA AAACT TGACAGAGCAAGACTCTGTCTCCAAAAAAAAAAAAA ATTCCTTGAACT CTAGCCGGGGTGACAGAGCAAGACTC TGTCTCCAAA AAAAAATTCCTTGAACT CICTAGCCTGGGTGACAGAGCAAGAC AAAAAAAAAATTCCTITGAACT GGACTCTAGCCTGGGTGACAGAGCAAGAC
61. E The following is an example that a read 1s mapped to the reference sequence with a deletion on the read The read deletes an nucleotide There is one deletion of length one reference 7008 TAGGAATAATGGGGGAAGTATGTAGGAGTTGAAGATT 7932 read TAGGAATAATGGGGG sequence quality 20 p 67 The following is an example that a read is mapped to the reference sequence with an insertion of length one The red A base is the insertion There are two mismatches and one insertion of length one reference 7905 TATCTATGATTTATATCAAATGAGTTTTTG 471 Eel read sequence quality ee i ZOOM will map reads to the target regions on the reference sequence within a given mismatch number or insertion deletion length If only mismatches are allowed between reads and reference sequences use Allow at most 2 mismatched base pair s plus an indel of length at most 0 base pair s Allow at mast edit distance s If you want to allow insertion and deletions you can use edit distance too The edit distance is the addition of the number of mismatches and the length of the insertion deletion Allow at most 2 mismatched base pair s hol ri 2 arm ST rm 1 Daze aoi 1 AL LI
62. GFF format please refer to http www sanger ac uk Software formats GFF GFF_Spec shtml WIG format The wiggle WIG format is for display of dense continuous data such as transcriptome data ZOOM uses this format to store the coverage read depth of each genome position in the selected regions Each WIG file has several annotation lines on the head of the file describing the features of this file and the configuration about how to show the results in UCSC genome browser You can revise the head of the file to get the display effect as desired track type bedGraph name Bed Format description BED format visibility full color 200 100 0 altColorz0 100 200 priority 20 The mapping results will then be shown line by line which are described by nine BED graph fields in each line of the file with tab delimited We adopt the BED Graph format with four fields since there could be different genomes in the reference sequences 1 chrom The name of the reference sequence 93 chromStartA The offset of the position start from zero 3 chromEndA The offset of the position the same as chromStartA 4 dataValue The coverage read depth of this position WIG file format Example track type bedGraph name Bed Format description BED format visibility full color 200 100 0 altColorz0 100 200 priority 20 chri chri chr1 chr1 chr1 chr1 chr1 chr1 chr1 556776 556777 556778 556779 556780 556781 556782
63. GGAGAGGACAT AAAAAAAACAAAACTACAGGCCAA GATGCAAAAATC AAAACCAGGA AAAAAAAAAAAAACAAAACTACAGGCCAATATACCT GCATAGATGCAAAAATC O AAAACCAG AAAAAAAAAAAAACAAAACTACAGGCCAATATACCT GCATAGATGCATAAATCI A GACATAACAAAAAAAAAACAAAACTIACAGGCCAA TA AATGAGCATAGATGCAAAAATC AAAACCAGGAGAGGACATAACAAAAAAAA GGCAAATATACCTAATGAGCATAGATGCAAAAATC AAAACCAGGAGAGGACATAACAAAAAAA ACTACAGGCCAATATACCTAATGAGCATAGATGCAA AAAACCAGGAGAGGAC AAAAAAAAACAAAACTACAGGCCAATATACCTAATG AGATGCAAAAATC AAAACCAGGAGAGGA AAAAAAAAAACAAAACTACAGGCCAATATACCTAAT TAGATGCAAAAATC AAAACCAGGAG ARAAAAAAAAAAAAAACAAAACTACAGGCCAATATAC CAAAACCAGGA ATAAAAAAAAAAAAACAAAACTACAGGCCAATATAC GAGCATAGATGCAAAAATCK _ S350 S560 3881 389 3900 3910 3070 4 m Click on the Reference Sequence Selecting Bar The reference sequence name list will be displayed If there are multiple reference sequences there will be a dropdown list where you can choose one reference sequence to show the alignments on it reference SEQUENCE N read information In this case is only one reference sequence named reference sequence
64. OM can guarantee 100 sensitivity ZOOM designs a framework to construct the efficient spaced seeds sets which can achieve 100 sensitivity for a large range of read lengths and mismatch numbers These spaced seeds sets guarantee great accuracy and speed of ZOOM The cases that guarantee 100 sensitivity in this release are listed the following table For cases with more mismatch numbers and cases with insertion deletion ZOOM still has good sensitivity If you do require 100 sensitivity beyond the listed cases please contact us We will be happy to design seeds specifically for your requirement 8 1 Cases for Illumina Solexa data Read Length Range bp Mismatch Numbers 12 256 14 250 8 2 Cases for AB SOLID data Since the data format of ABI SOLID is color space ZOOM extends the multiple spaced seeds set used for Illumina Solexa data Spaced seeds are used between the color space of reads and the color space of reference sequences Note that in the following table the mismatch number is the summation of the polymorphism number on base space and the sequencing error number on color space However since one polymorphism occurs on base space there are two adjacent mismatches on the color space So the mismatch number on the color space is in fact at most the summation of sequencing error number on color space and two times the polymorphism number on base space 116 For example if a read of length 50bp has four polymorphisms w
65. OM will jump to the mate read of the selected reads and the alignment If a read is mapped to multiple positions press the lt and gt button to jump to other positions of this read and show the corresponding alignments Mapping Results Summary panel Click a UNIQUE or ALL Result node and click en toolbar icon 7 ZOOM next generation sequencing File Control Tool Help hh Solexa_single_end_test Scheduling Zi Results d 5 single end test UNIQUE mary information The summary of the uniquely mapped results will be show in a Mapping Summary tab beside the read information tab 88 Mapping Results Summary Totally 8017 reads are uniquely mapped to the reference sequences The detailed statistics of mapping is as following Insertion deletion length Read Number 0 0 3663 4 9 34 05 gt 9 1 1624 Total number of mapping positions and a statistic table will be shown The statistic table contains four columns Each row will show how many mapping positions the third column are mapped to the reference sequence with x mismatches the first column and one Insertion deletion of length y the second column and what the ratio of these mapping results over all mapping results the fourth column SNP panel SNP candidates found are listed in a table in the SNP panel which is shown as a tab window in the Detailed information panel Please refe
66. SNP candidates are located a table containing SNP candidates will appear in the SNP Caller window If you are not satisfied with the SNPs found press again try more stringent or less stringent criteria After this another tab window entitled SNP Caller will be displayed in addition to the existing SNP Caller tab window 62 View SNP Candidates The SNP Caller tab window will show the detailed information of each SNP in a table view as follows Export all SNPs consensus ReadDepth best base bestBaseC 2thbestbase 2th BestBa Each row of the table is an SNP candidate The table has 9 fields showing 9 features of each SNP refID the id of the reference sequence starting from zero ref offset the offset on the reference sequence where is the SNP located ref base the nucleotide of the reference sequence on the position consensus base the nucleotide of the consensus sequence on the position ZOOM can build the consensus sequence in a haploid genome or a diploid genome If the organism is viewed as a diploid the nucleotide on the consensus sequence is IUPAC code which can use one alphabet to denote a haplotype Say S denotes the haplotype lt G C while means G A gt Read Depth the amount of mapped reads covering the position best base the nucleotide with the largest amount of read depth on the SNP position bestBaseCount the amount of the best nucleotide on the SNP
67. SNPS AND SMALL INDEL CANDIDATES 94 6 2 die ea 97 Operations tertia te oto 98 6 3 diode 99 6 4 EXPOR TALESN PO scidit ete itunes dna IR ni E 100 MES CIPUE 102 7 1 EXPORT MARPING RESULTS 102 ZOO IVA FONE acetate E 103 BED TOV i E baci dem EAA 107 RT E A E A R 109 ALEO M 110 7 2 EXPORT ASSEMBLED CONSENSUS SEQUENCE 112 CONHSEHSHS SEQHOHCE TL EAS TA 112 C nsensus FAS bee seit avitum 113 8 100 5 5 5 116 8 1 CASES FORIELUMIINA SOLT XA DA b ee 116 8 2 CASES TOR By ps 116 9 FREQUENTLY ASKED 5 119 10 ABOUT BIOINFORMATICSSOLUTIONS INC 123 11 ZOOM SOFTWARE LICENSE 124 12 REFERENCE ZOOM
68. Solexa_single_end_test UNIQUE 35 k Emme meme P C03 03 002 03 Lur en en en ce en em en en qT lh m T DS C CSV P CS mm m m e en em iur en en em pum pum pum pum pum pum um pum pum um pum pum qm pum pum qum pum pum pum pum pum qum ur bri rur urb ur ME M pe utn MU E cm m e e o0 603 002 Lur 003 riu Parii rur m ra en en E03 C03 rrr rm m m em im e aa av ge e en en Lur en en wu uu uu pum qu uum u qum quum qum qum qum T SS 1 qT P PAD CS A AD C DS DS 0d CJ CS CJ PS em Pur Pur Pur Pur ur ur Pur ur b n urb ee ee melee C02 dd DD em cn C0 003 072 003 03 002 003 0 C03 073 003 C02 003 0602 HEZ 13333132130 621 73330324 m m m
69. TAGTGACCGAACCT AAAATTCCTTGAACTGAATGATTATAGTGACCOGAAC AAAAGTTCCTTGAACTGAATGATTATAGTGACCGAA CAAGACTCTGTCTCAAAAAAAAAAAAAAAAAAA AAACTGAATGATTATAGTGACCGAACCTATCAAAAC AAGACTCTGTCTCCAAAAAAAAAAAAA ATTCCTTGAACTGAATGATTATAGTGACCOGAACCTA C CAAGACTCTGTCTCCAAA AAAAAATTCCTTGAACTGAATGATTATAGTGACCOA CAAGACTCTGTCT AAAAAAATTCCTTGAACTGAATGATTATAGTAACTG ATCAAAACC CAAGACTCTGTCTCAAAAAAAAAAAAAAAAAA ATTAAACTGAATGATTATAGTGACCGAACCTATCAA _ CAAGACTCTGTCTCCAAAA AAAAAAATTCCTTGAACTGAATGATTATAGTGACCG TTCAAAACC AAAAAAAAAATTCCTTGAACTGAATGATTATAGTGA CTAATCAAAACC CAAGACTCTGTCT AAAAAAAAAAAAATTCCTTGAACTGAATAATTATAT GAACCTATCAAAACC AAGACTCT AAAAAAAAAAAAAATTCCTTAAACTGAATAATTAT A GAACCTATCAAAACC CAAGACTCT AAAAAAAAAAAAAAATTCCTTGAACTGAATGATTAT COGAACCTATCAAAACC CAAAAAAAAAAAAAAAAAATTCCTTGAACTGAATG AGTGACCGAACCTATCAAAACC CAAGACTCTGTCTCAAAAAAAAAAAAAAAAA AAAAAAAAA CTTGAACTGAATGATTATAGTGACCGAACCTCTCAA AAAAA AATTACTTGAACTGAATGATTATAGTGACCGAACCT ACC CAAGACTCTGICTCCAAAAAAAAAAA AAATTCCTTGAACTGAATGATTATAGTGACCGAAAC AAACC 1 1 0 e T T e CHOC T 3 e
70. TCTG A cee CACCCAAGAATGATCAATTAAAAAAAAAAAAAATGG TGCGGGCCGGG sequence AAAAAATGACCGGCCGGGCGCAG C AAAAAAACA GC GAGAAACAC AATATAAAA T TCTGTGAGAAACACCCAAGAA TGATCAATTAAA AAAAAATGGTAATTTAAAAAATTACCGGCCGGGRGC AATTAAAAAAAAAAAAAAA TGGTAATTTAAAAAATT GCGCAG Consensus TCTGTGAGAAACA _________ GCGCAG ct t agaaacacccaagaatgatcaattaaaaaaaaaaaaaaat dtaatttaaaaaattaccudccuddcucadtut Reference ct tgagaaacacccaadaatgatcaattaaaaaaaaaaaaaaat dtaatttaaaaaattaccudgccugdcgcagtgrc 3320 3330 3 3350 S360 3370 3330 3390 sequence The sequence at the bottom of the window is the reference sequence The sequence with green background over the reference sequence is the consensus sequence generated by the mapped reads along the reference sequence The orange background of the nucleotides on the read or the consensus sequence highlights the difference from the nucleotide on the position of the reference sequence The default display of the read is in the nucleotide space For ABI SOLID data the default 01 display is the decoded nucleotide reads according to the mapping results Press 29 button to AC switch the reads display from the nucleotide space to the color space and 4189 vice versa The reads shown in color space look like the following 34 X Mapping Result abi test UNIQUE x Mapping Result
71. TCTGTCT AAAAAAAAAAAAATTCCTTGAACT TGCCCTCTAGCCTGGGTGACAGAGCAAGAC TC T AAAAAAAAAAAAAATTCCTTAAACT aGCAGTCTAGCCTGGGTGACAGAGCAAGACTCT AAAAAAAAAAAAAAATTCCTTGAACT TGCACTCTAGCCTGGTTGACAG CAAAAAAAAAAAAAAAAAATTCCTTGAACT TGCACTCTAGCCT CAGAGCAAGACTCTGTCTCRAAAAAAAAAAAAAAAA CTTGAACT TGCACTCTAGCT GACAGAGCAARGACTCTGTCTCCAAAAAAAAA AA AAA CTTGAACT GTGAAAGAGCAAGACTCTGTCTCCAAAAAAAAA AAA AATTACTTGAACT 66 TGACAGAGCAAGATTCIGICTCCAARAAAAAAAA AAATTCCTTGAACT t cactctaducctuggdt uacagatgcaadactct rtcetceMdaaaaadaddadaaaadaddaddaticctt aac 9 1180 1180 1200 1210 1220 1230 124 4 11 01 reference sequence 1172 1249 17 7 23 read information Mapping Results Summary SNP Caller Read name 5494 Reference offset 1192 Mapping direction Positive reference 1192 AGAGCAAGACTCTGICTCCAAAAAAAAAAAAAAAAA 1227 read AGAGCAAGACTCTOTCTCAAAAAAAAAAAAAAAAAA sequence quality _ 7 Click the SNP Caller tab to show the SNPs show what Click the 4 5 or the 4 button to jump to the previous or the next SNP candidate 8 Click the Read Depth field in the header of the SNP table to sort the candidates according to the read
72. TTTTTTTTT TITTITIGTATTTTTGGTAGGGACGGGCCTTCCCCGOG CAGGC TGG CTCGCCTGGCTACATTTTTTTTTTTTITIT TTTTTTGTATTTTTGGTAGGGACGGGCTTTCCCCGT CCAGGC TGG CTCGCCTGGCTACATTTTTTTTTTTTTTTT TTTTTTGTATTTTTGGTAGGGACGGGCTTTCCCCGT CCAGGCTGG 7 cctcgcctggctaca 5410 11 S470 2430 5440 e au tox hapi fa 1878000 2040753 0 based 5450 tqgtagggacgggctttccccgtgttgytccaggctgogtc Q9OTagggacgggc 5460 S470 5450 5380 5405 5493 Read You can check the bases on this position in detail Depth base bestBaseC 2th bestbase 2th BestBa Furthermore you can click the read you are interested in and check the alignment and the quality score of this position in the read information panel 98 Read name 4 192 66 520 Reference offset 5415 Mapping direction Positive reference 5415 TGGCTACATTTTTTTTTTTTTTTTTTTTTTTTTTTG 5450 TGGCTACATTTTTTTTTTTTTTITTTITTTTTTTTTT Sequence quality PET gt Forward button and lt backward button Press the lt button or the gt button The previous or next row of the SNP will be selected in the table At the same time the cursor in the mapping results illustrating window will jump to this newly selected SNP Sort the columns in the SNP Table The nine columns of the SNP Table can be sorted By default the table
73. That is to say only select the UNIQUE result nodes rather than ALL result nodes to carry out SNP candidate analysis This is due to fact that the result node contains top N mapping results for each read those reads mapped to multiple positions of the reference sequence will make the SNP finding process unreliable 1 Select one or more UNIQUE Results nodes Y 2 Click the filter SNP candidates toolbar icon or Select SNP Filter from the Tools menu A window showing Filter criteria will be shown as follows SNP Finding Filtering criteria Atleast 3 reads mapped to this position read depth 7 2000 reads are mapped to this position Discard the reads whose sum of base quality is less than 20 Average quality score on the SNP position of all reads covering this position should be larger than 10 Average quality score of the 5 bases on each side of the SNP should be greater than 10 Cancel ox Click on the checkbox of the filtering criteria which you want to apply towards SNP finding and modify the values in the value fields as you prefer 4 Press OK and the SNP finding will commence on all the reference sequences A progress bar will pop up This process may take some time depending on the data size since all 96 the reference sequences will be assembled and filtering criteria will be carried out on all the SNP candidates When all
74. ail Click Next 4 The following window will appear 10 license wizard Select paste the license content from the email to paste the license information between gt and lt in the email or select import the license file the email attachment and browse to locate the license 5 Click Next paste the license content from the email Q import the license file the email attachment license wizard License import completed The license has been imported successfully 5 The following window will open Click Finish if you receive a message that the license has been imported successfully Registration Instructions No Internet Connection 1 Select Request license file without Internet connection and click Next twice 2 The following window will open If you have purchased ZOOM and have a registration key select Registration Key Enter your registration key as well as your loc Email confirm Important You will receive your license file via email OR Registration Key 22 OSOS f license wizard name and email address and click Next Request a 30 days evaluation license no registration key required If you are trying a demo of ZOOM and do Institution not have a registration key select Request a 30 days evaluation license No registration key required Enter your name email address as well as y
75. al fastq files for ABI SOLID data For details please refer to Chapter 4 in this manual 7 Create new job Steps 1 Basic information 2 Input reads 3 Reference sequences 4 Mapping parameters Add read file dir s to list Switch to paired files mode 1 Click iius button navigate to Sample end directory and select the read fastq file The file will be selected in the read file list read file Moomstudio L 315ample Data 5olexa single endYread fastq 25 Add read file dir s to list 2 Click select read csfasta file in the Sample_Data SOLiD single_end directory Then the button again to select other reads files For example read csfasta will be loaded into the read file list too read file C VoomsStudio 1 Data amp olexa single end read fastq oomstudio1 35ample Data amp OLiD single endYead csfasta C ZoomStudi Note that ZOOM also recognizes that the read csfasta file has a corresponding quality file read QV qual It will load the quality file too By clicking and dragging the mouse on the boundary between the read file and quality file headers you can tune the width of the tablet and show the full name of the quality files as follows read file VoomsStudio 1 3 Sample_Data Solewalsingle_end ZoomStudio 1 3 Sample_Data SOLiD single_end ye C ZoomStudio 1 3 Sam
76. ality threshold will be ignored Multiple spaced seeds Multiple spaced seeds which further enhance the sensitivity are several spaced seeds optimized simultaneously against a given level of similarity PatternHunter II using multiple spaced seeds would approach the sensitivity of the Smith Waterman algorithm while gaining Blastn speed Oligos oligonucleotides short DNA or RNA sequences Optimal spaced seed a novel idea proposed first in PatternHunter to enhance both sensitivity and speed of filtering in the pairwise homology search process Compared to a consecutive seed which requires the query sequence and the target sequence to share a sequence block of same nucleotides optimal spaced seed requires only designated positions to be the same The strategy was proven in PatternHunter to enhance sensitivity and speed greatly when compared to BLAST Quality score the quality or confidence score of each nucleotide sequenced It 15 a hint of the probability of this position 1s correctly sequenced Reference offset the leftmost position where a read is mapped onto the reference sequence Paired end reads two reads sequenced from both ends of the DNA fragment The paired end reads from the same region of the reference sequence are expected to be located on the same chain and separated by a known distance range The orientation and distance limit help to locate unambiguous reads They are also helpful in finding insertion deletion and structural
77. apping Results 1799 mapping positions on the reference sequences The detailed statistics of mapping is as following Insertion deletion Read Number 1070 3 10 Show Mapping Results of Several Jobs Together If two or more jobs have the same reference sequence you can choose to merge the mapping results of these jobs to show the mapping results together 1 Press the Ctrl key the keyboard and click the Solexa single test UNIQUE Results node and the Solexa_sinlge_end more UNIQUE Results node Release the Ctrl key 47 ZOOM next generation sequencing File Control Tool Help BRB OO 5olexa single end test n Le n or eer pany Scheduling E 2 Click gt toolbar icon to display the merged mapping results 1 the mapping results window Mapping Result Solexa single test UMIQUE Solexa single end test more UNIQUE 2000 4000 8000 2000 10000 4 11 0 11111 gt 1195 6 3 You can do any operation on it as single result node or SNP finding on these merged mapping results 48 3 11 Remove jobs If you want to remove jobs from the workspace or disk click the corresponding job nodes and then click the 4 amp 3 tool bar icon DECR ec 22 1 Click on the Solexa single end test job node and click 87 tool bar icon A confirming dialog will pop
78. ase The quality score of each base denotes the probability that this base 1s correctly sequenced By default ZOOM displays the quality scores along with the alignment between the read sequence and its target region on the reference sequence to give the users an intuitive impression about which bases have low quality score as follows reference 8437 GGGIGATGAGGAATAGTGTAAGGAGTATGGGGGTAA 8402 sequence quality Quality score be utilized to enhance mapping results as well If the reads files are in FASTQ format according to the different methods of coding numerical quality score into alphabet there are two types of FASTQ format Sanger type and Illumina type For Sanger FASTQ format the quality score of each position equals ord q 33 while for Illumina Solexa FASTQ format the quality score of each position equals ord q 64 Read Qualities FASTQ format will be regarded as Sa ger 5 Don t count mismatches on base 5 Ignore reads with less than 8 than count all bases Choose the correct type by clicking the combo box ZOOM adopts two ways to utilize quality scores to enhance mapping results 66 The first way is to ignore mismatches occurring on read positions with low quality scores during the mapping process A low confidence score denotes low seque
79. can focus our talents on providing solutions to difficult otherwise unsolved problems that have resulted in research bottlenecks At BSI we are not satisfied with a solution that goes only partway to solving these problems our solutions must offer something more than existing software The BSI team recognizes that real people will use our software tools As such we hold in principle that it 15 not enough to develop solely on theory we must develop with customer needs in mind We believe the only solution is one that incorporates quality and timely results a satisfying product experience customer support and two way communication So then we value market research development flexibility and company wide collaboration evolving our offerings to match the market user s needs Efficient and concentrated research development customer focus and market analysis have produced PEAKS software for protein and peptide identification from tandem mass spectrometry data RAPTOR for threading based 3D protein structure prediction PatternHunter software for all types of homology search sequence comparison and ZOOM for next generation sequencing 123 11 ZOOM Software License This is the same agreement presented on installation It is provided here for reference only If we are evaluating a time limited trial version of ZOOM and we wish to update the software to the full version we must purchase ZOOM and obtain a full version registration key 1
80. d information overall progress 100 Read name 4 87 627 528Reference n 49728 m Copy the read sequence Mapping dire Reverse TCCTTTTTTAATTCTTATGTTTATTTTTAAGCCTGA Find its mate pai TCCTTTTTTAATTCTTATGTTTATTTTTAAGCCTGA 3 3 Setup your working environment ZOOM works in a client server mode By default ZOOM will launch a server in the local computer Let s use the default configuration in this quick start section If you want to use different servers or multiple CPUs on multiple servers refer to the Set up your working environment section in Chapter 2 to configure the ZOOM GUI client and ZOOM server properly 34 Create a Job This will be a rather simple job as it will only contain one read file and one reference file however the same process can be used for jobs with reads directory or multiple reads files and multiple reference sequence files Click on the Create a new job toolbar icon ig or select New Job from the File menu The following window will appear 22 7 Create new job Job Label job 1 Steps Job Location F ZOOMDB SSS Job Folder F ZOOMDB job 1 2 Input reads Notes Description 3 Reference sequences Created on Tue Aug 25 10 02 49 EDT 2009 4 Mapping parameters Basic information This part is used to assign a name for your job and a directory to store the data related to your job After you finished the job you can
81. data Answer ZOOM is optimized for Illumina Solexa and ABI SOLID data ZOOM can get good mapping results on these two instruments However the sequencing error types of 454 instrument and Helicos instrument are quite different which contain many short indels ZOOM can t guarantee good mapping results because currently ZOOM can only handle one gap of any length rather than many gaps However you could give it a try could since ZOOM can handle reads over 200bp and can deal with reads of variable lengths automatically Any feedback is appreciated We would like to support these two instruments in the future 122 10 About BioinformaticsSolutions Inc BSI provides advanced software tools for analysis of biological data Bioinformatics Solutions Inc develops advanced algorithms based on innovative ideas and research providing solutions to fundamental bioinformatics problems This small adaptable group is committed to serving the needs of pharmaceutical biotechnological and academic scientists and to the progression of drug discovery research The company founded in 2000 in Waterloo Canada comprises a select group of talented award winning developers scientists and sales people At BSI groundbreaking research and customer focus go hand in hand on our journey towards excellent software solutions We value an intellectual space that fosters learning and an understanding of current scientific knowledge With an understanding of theory we
82. deletion information gt If you use rescore parameter which utilizes quality scores to evaluate the mapping probability of the alignment of each mapping result there is one more field 10210 probability of the alignment Read label Reference name Reference offset Strand Mismatch number Insertion deletion information Log of mapping probabilitv Read label the name of the mapped read If there is a tab in the read label ZOOM will transfer the tab into a space Reference name the name of the reference sequence which this read is mapped to If there is a tab in the reference name ZOOM will transfer the tab into the space Reference offset the position that the read mapped on this reference sequence starting from zero By default the leftmost position is always returned no matter whether the read 15 mapped to the positive or negative strand Strand the strand of the reference sequence that the read is mapped to A means the read 15 mapped to the positive strand of the reference sequence A means the read 1s mapped to the negative strand of the reference sequence 103 Mismatch number the Hamming distance or the number of mismatches between the read and the target region on the reference sequence it maps to Insertion deletion information the information relating to the insertion deletion between the read and the target region of the reference sequence on this offset The fi
83. difference o read 14743 is mapped to the offset 29 of the negative strand of the reference sequence chr with two errors which is the number of polymorphisms on the base space plus sequencing error numbers and no insertion deletion The polymorphism occurs on the last base pair of the nucleotide read while the sequencing error occurs on the 26th base of the color space read 106 read 7222 is mapped to the offset 32 of the positive strand of the chr1 with one error and one insertion of length one starting from the 34 base of the read The error 15 a Sequencing error on the antepenultimate base of the color space read o read 4063 is mapped to the offset 51 of the positive strand of the chr1 with one error and one deletion of length one starting from the 34 base of the read The error is a sequencing error on count down 7 base of the color space read Log of mapping probability loglIO the probability of the alignment The value is computed by combining the quality scores of each color space base and the prior probability of SNP occurrence on nucleotide base space This is a negative number and the bigger the better The two values are delimited by a colon An example with mapping probability 9279 chri 10 14743 chri 29 Teee chi 32 4063 chri 51 ATACGGTTTGAGAATGTAGTICARACGCCTCCAGTT 00000000000000000000000000000010000 3 999175 CoGTAATACGTGACTCCTGATACGGTTTGAGAAT 00000000000000000000000001000000000 8 596596 33 TCTCAAACCGTATC
84. e Job Running Monitor below the window of Job View Panel as follows 72 type job total tasks 12 Description panel overall progress 100 Inspector panel When a job is selected the number of tasks in this job and the overall progress of this job will be shown in the Description panel The total number of tasks of this job and the progress of these tasks will be shown in the Inspector panel In the Inspector panel each row 1s the running status of a task There are three columns denoting the name of the task the progress of this task the total running time of this task appear until the task 15 finished The running time is in the format of hours minutes seconds Subtask Subprogress Time 00 01 30 00 01 15 get results 00 00 07 When clicking a task in the Job View Panel the detailed progress of each step of the task will be shown Each row 15 a step of the task pload data Usually three steps are included in each task o Upload data upload the reads sequences of this task to the server Map reads map the uploaded reads to the reference sequence according to the parameters assigned o Get results collect the mapping results from the server to the client to show the mapping results and post analysis Job Properties Panel Select a job in the Job View Panel and click the Job Properties button The properties of this job will be shown in the popup
85. e Chapter 6 in this manual 40 2 Click on the checkbox of the filtering criterion At least reads are mapped to this position and revise the value to 10 bees aam t SNP Finding oo Filtering criteria Atleast 10 1 reads are mapped to this position read depth 2000 reads are mapped to this position Discard the reads whose sum of base quality is less than 20 Average quality score on the SNP position of all reads covering this position should be larger than 10 Average quality score of the 5 bases on each side of the SNP should be greater than 10 3 Press button Then SNP finding on all the reference sequences will be carried out A progress bar will pop up 4 When all SNP candidates are located a table containing SNP candidates will appear in the SNP Caller tab as follows 111 a Ad reference sequence v 3960 4037 Mapping Results Summary SNP Caller ref offset ref base consensu Read Depth best base 2thbestb 2th Best 0 1210 11 r A 4 __ m M 2m C 1 A 3 _ __ 1532 c M 2 1 0 A 2 41 Each row of the table is a SNP candidate The table has 9 fields showing 9 features of each SNP The description of each field is in Section 6 2 in Chapter 6 Click the button The SNP summary X amount of SNP candidates satisfying the filtering criteria and t
86. e or more components to perform the actual time consuming computational tasks These components are called ZOOM servers or servers for short in this manual which do not necessarily have to reside in the same machine as the ZOOM GUI Generally the more ZOOM servers that are used the faster and less time you will need to process data illustrated below ZOOM server 1 192 168 1 5 20001 A ZOOM GUI 192 168 1 4 By default ZOOM Studio already provides and started a local ZOOM server so you can start your work right now without more advanced settings You can verify the existence of local ZOOM server by clicking the icon NOTICE For Windows users the built in server has limited processing capability and we therefore strongly recommend the user to use the LINUX ZOOM servers instead If the user needs more advanced features such as starting multiple servers or starting a remote ZOOM server or utilizing multiple cores of modern CPUs please follow these steps to add a ZOOM server manually The user is required to configure both the client side computer A where ZOOM GUI is and the server side computer B where ZOOM server is Suppose the ZOOM GUI is running on computer with IP address 792 168 1 4 and the ZOOM server is going to run the port 20001 on the computer B with IP address 792 166 1 5 15 Configuration of the client side Computer A 1 On computer A in the ZOOM GUI click the IN icon on t
87. eesessnanaeeeeeeeeeeeeeesseseeeeeeneas 22 3 4 CORE JOB HIE 22 BA LON NENNT PPM 23 TE M NET 25 Ie eh Et att ma a mut Pa 27 Mappi 28 3 5 MONITOR THE JOB ERI 28 TE E E E A 29 TOO View PINEL irrin inan 29 TRUER Status Of THE 30 COUT OTIC HT 30 3 6 DISPLAY MAPPING RESUL 5 MANU TUE 31 3 7 PINDING SNIP CANDIDATES 40 3 8 EXPORTDATA INTO FILES 44 3 9 CHANGE PARAMETERS GET MORE MAPPING 5 5 2 000000 44 3 10 SHOW MAPPING RESULTS OF SEVERAL JOBS TOGETHER eene nnne 47 S A E EE 49 3 12 PAIRED END MATE PAIR READ MAPPING EXAMPLE sssssessssssscesesssescscescscscsesssscscsesssscseseeusees 49 4 DATA eee 54 41 MER 54 54 ga c dia 55 uL I DAE T 55 FAS IG FO IU N 56 Onereud p
88. eld will be the following three cases o No insertion deletion Only mismatches found o I lt length gt _ lt offset gt There is one insertion of length length behind offset o D length offset There is one deletion of length length starting from offset Note that the offset 1s the offset on the original read sequence starting from zero no matter if the read 1s mapped to the positive chain or the complement chain of the reference sequence An example output file 1427 chr6 9 5952 chr6 72 How to interpret the results read 1427 is mapped to the offset 9 of the negative strand of chr6 with only one mismatch read 5952 is mapped to the offset 72 of the positive strand of chr6 with zero mismatches and one insertion of length 2 starting at the 34th base of the read o read 6353 is mapped to the offset 109 of the negative strand of chr6 with two mismatches and one deletion of length one starting at the 36 base of the read Log of mapping probability log10 the probability of the alignment This value is computed using the quality scores of each base This is a negative number and the bigger the better An example with mapping probability 1427 chr6 9 M 6 244083 5952 chr6 9 I2 33 1 193035 Output for ABI SOLID reads 104 ZOOM map ABI SOLID reads within a given Hamming distance which is the number of mismatches allowed between the read and its target region on the reference s
89. ember current position read information There are two operations that you can operate on this reference offset bar Locating to a given offset or a offset range 84 If you want to see the mapping results on a specific position of the reference sequence a specific range of the reference sequence type in the offset or the offset range in the locating bar and press the Enter key on the keyboard 2563 or 2563 3456 z The mapping results around this position will then be shown in the mapping results illustrating window Remember current position You can choose to store an offset range and go back to this region afterwards Let us assume that 2513 2500 is our current offset range displayed within the Mapping Results Illustrating window on the reference sequence Click remember current position and click the Reference offset bar again You will see that x 0 25 13 2590 is recorded here You can go back to this region at any time you want 0 2513 2590 remember current position Switch button The default display of the read is in nucleotide space Available only for ABI SOLID data For ABI SOLID data the default display is the nucleotide reads decoded according to the mapping 01 results Press the 44 button to switch the reads display from nucleotide space to color space AC or click the button to go backwards The reads shown in color space as follows
90. equence Reference sequence The sequence with a green background is the consensus sequence generated by the mapped reads along the reference sequence The reads are shown in different scales according to the length of 82 the region of interest on the reference sequence A red background of the nucleotides the read or the consensus sequence highlights a difference from the nucleotide in the same position on the reference sequence X Mapping Result ABI_mate_pair_test UNIQUE GCCCCATCACCACTGTGGGCACGCCGGTGCCGGTTAAGTGCACCGCCCCG Oe FIDCM pete GCTGGCCCCATCACCACTGTGGGCACGCCGGTGCCGGTTAAGTGCACCGC GCTGGCCCCATCACCACTGTGGGCACGCCGGTGCCGGTTAAGTGCACCGC TGCTGGCCCCATCACCACTGTGGGCACGCCGGTGCCGGTTAAGTGCACCG TGCTGGCCCCATCACCACTGTGGGCACGCCGGTGCCGGTTAAGTGCACCG TTGCTGGCCCCATCACCACTGTGGGCACGCCGGTGCCGGTTAAGTGCACC GCGGGTTGCTGGCCCCATCACCACTGTGGGCAC When clicking on a read a read rectangle will highlight the read sequence A blue column will appear highlighting the nucleotides in the same column GCGGGTTGCTGGCCCCATCAC GCGGGTTGCTGGCCCCATCAC GCGGGTTGCTGGCCC GCGGGTTGCTGGCCC GTTAAGTGCUCCGCCCCGCCGTCCGTGCCGCCGTTGCTGAAC GCGGGTTGCTGEUC TGTGGGCACGCCGGTGCCGGTTAAGTGCACCGCCCCGCCGTCCGTGCCGC TOC TOO AC ToTogccacoccugTeccagtTAAGTacAccoccccoce 111 GCGG TGAAC GCGGGT TGC ACCAC TG IGGGCACGCCGG TGCCGGTTAAGTGCACCGCCCCGCCGTCCGT TTGCTGAA GCGGG TT
91. equence ABI 5011 reads use the color space format The differences between the read in color space and the reference sequence are caused either by sequencing error or genomic differences such as mutations or SNPs Sequencing errors may cause some reads to be mapped incorrectly to the reference sequence ZOOM is able to distinguish sequencing errors from genomic differences by correcting the sequencing errors and allows more reads to be correctly mapped ZOOM 1s also able to decode mapped color space reads after error correction and highlight both genomic differences and sequencing errors Each line of the file corresponds to a mapped position which contains eight basic fields delimited by a tab as follows Read label Reference name Reference offset Strand Total error number Insertion deletion information Decoded nucleotide sequence Mark of sequencing error position If you use the rescore parameter which utilizes quality scores to evaluate the mapping probability of the alignment of each mapping results there 15 one more field log10 probability of the alignment Read label Reference name Reference offset Strand Total error number Insertion deletion information Decoded nucleotide sequence Mark of sequencing error position Log of mapping probability Read label the name of the mapped read If there is a tab in read label ZOOM will transfer the tab into
92. equirement of minimal read depth At least k reads cover this position V Atleast 10 reads mapped to this position read depth The requirement of maximal read depth At most k reads are allowed to cover this position J 4000 reads are mapped to this position If quality score files are included quality score can be utilized to filter SNPs ZOOM will compute the sum of the quality score of each read sequence and discard those reads whose sum of base quality score 1s less than k because the reads of low quality might be a mapping error Discard the reads whose sum of base quality is less than 900 The requirement that variation position should have high quality score which is measured by average quality score on the SNP position of all reads covering this position J Average quality score on the SNP position of all reads covering this position should be larger than 30 95 The requirement that the positions around the variation position should be of high quality score Average quality score of the 5 bases on each side of the SNP should be greater than 20 You can choose one or several UNIQUE Results nodes to utilize these factors to filter out the SNP candidates satisfying their requirement Multiple UNIQUE Results nodes of jobs with the same reference sequences are allowed to be selected to analyze SNPs together We suggest the user find SNP candidates only using the uniquely mapped reads
93. er une wi toa TW SCONES conata bate cadeau otc EEN 56 4 2 COLOR SPACE 57 Appicd Biosystems SOLUN coasta ut i aceti T 57 Applied Biosystems SOLID csfasta amd EH 57 Applied Biosystems SOLI as essen 58 4 3 ISEEERBNCESEOUENCE PILE FORMA ots 59 Li SE 59 A EE 60 tra E E ed cater aoa 61 ISelerence SEQ O bu pe tava eee aaa 63 MPN E PONCO NM m 64 4 5 ae d datos 65 DSO SUA dtc E es A E A ee 65 PAVE CG dido 66 66 MADOU 67 COHCCHTO 69 AG OPENA unis 70 4 7 ORIENT YOUR 2 D ev E 71 PO ack ts ost 1 JOO TK UMTS ION OT 22 JOD
94. eted b After the progress is finished you can see tabbed window containing the mapping results on the right hand of the main window of ZOOM as follows 31 vd ZOOM next generation sequencing Lo 61 25 File Control Tool Help OO0 4X 9 0 3 49 Solexa single end test X Mapping Result Solexa single end test UNIQUET Scheduling E Results 4 Solexa_single_end_test UNIQUE T T T T 2000 6000 10000 111 o e 91 reference sequence 515 10184 job selected read information Running monitor Click the read to se Drag the left mouse H Copy the read sequence Job Properties Hang up the mouse Subtask Subprogress Time Find its mate pair The line in the graph is the overview of the read depth of those mapped reads along the reference sequence The horizontal ruler denotes the positions on the reference sequence The vertical ruler denotes the read depth 4 Press 4 button 00 Result single end test to zoom in the graph or press to zoom out in 1 the graph 5 Click the left button on your mouse and drag along the graph to form a rectangle region and then release the mouse button 2000 4000 6000 8000 10000 32 The selected rectangle region will be enlarged to the full window of the Mapping Results Displaying Window
95. fault name is the original name suffixed by 2 Click Next twice to mapping parameters step 3 Check the radio box from the unique to top and modify the value to 2 to keep up to 2 mapping results for each read Collecting Results For each read report the unique best mapping position 9 top 10 best mapping position 4 Modify the mismatch Mapping Criteria number from 2 to 4 which will 7 check this to consider read length into mapping criteria allow up to four mismatches Allow atmost 4 mismatched base pair s between the reads and the F plus an indel of length at most 0 base pair s reference sequences q 3 Allow at most 0 edit distance s Achieve high sensitivity more mapping results but lower mapping speed 5 Click the check box to achieve high sensitivity 45 This will achieve full sensitivity to find all the mapping results with up to 4 mismatches For more information on using this parameter please refer to step 4 Section 4 3 in Chapter 4 6 Click the Finish button to create this new job A new job Solexa single end test more will be created and processed After the new job is finished there will be an additional job appearing in the Job View panel as follows Solexa single end test Scheduling Results i Solexa single end test UNIQUE single end test more The new job has two Res
96. file will be loaded together By default the quality score file of seq txt is prb txt the quality score file of csfasta is COV qual the quality score file of csfasta is csfasta qual You can add your own pattern of recognizing the quality score file Paired end Mate pair files suffix ZOOM can recognize the two files from the paired end data Mate pair data By default ZOOM will mate up _F3 csfasta with _R3 csfasta l fastq with 5 2 144 with 2 14 You can enter your own pairing criteria by clicking the dropdown list and choose add pattern as follows andread file R3 csfasta are pair end mate pair read files 16 add pattern And enter the pattern you preferred in the popup windows input amm 25 The read with name can be paired with the read file with name 78 5 Mapping Results 51 Show Mapping Results After the job icon turns to the job is finished ZOOM will help survey the mapping results in a preferred scale Make sure that the node under the Results node 15 selected when choosing data to be analyzed 1 Select an UNIQUE ALL Results node of a Job in the Job View panel 99 2 Click the Display mapping result toolbar icon il 7 ZOOM next generation sequencing File Control Tool Help m EX 42 t sve 1 9 Solexa single end test E
97. he 77 button to move to the Input reads step Click Pairedfiesmode to change to the mode of inputting mate pair reads as follows Create new job Steps reverse read file 1 Basic information 2 Input reads 3 Reference sequences 4 Mapping parameters Auto find pair read files into list 50 The read file list window is split into two windows each window load each end of the mate pair reads file Make sure every two files in the same row of the left and the right window are paired 3 Click the 899 rd it button Choose both read F3 csfasta and read R3 csfasta files in Sample_Data SOLiD mate pair directory ZOOM will automatically recognize the possible paired files and put them in one row together with their quality file if any forward read file quality file reverse read file quality file C ZoomStudio 1 3 Sample_Data SOL iD Vzooms5tudio 1 315 C FoomStudio1 3 Sample_Data SOLiD C VzoomStudi ZOOM automatically finds paired read files according to the suffix of the files filename F3 csfasta will be paired with filename R3 csfasta and filename l fastq will be paired with filename 2 fastq If you choose a directory ZOOM will automatically pair the files satisfying the naming rule Thus if you want ZOOM to pair the read files for you please make sure the file suffixes are correct You can choose to add some patterns of rec
98. he filtering criteria 3 candidates are found adopted will be shown Filtering criteria At least 3 reads are M to this position read depth 5 Double click the first row in the table to show the first SNP Mapping Result Solexa single end test UNIQUE GATT GCGCTCTAGCCTGGGTGACAGAGCAAGACTCTGTCT AATGATTI TGCACTCCAGCCTGGGTGACAGAGCAAGAC AATGATAJ TGCAC TC TAGCC TGACAGAGCA TGCACTCTAGCCTGGGTGACAGAGCA T TGATT TGCACTCTAGCCTGG AED TGCACTCTAGCCTAGGT TGCACTCTAGCCTG GG IGACAGEG TGCAC TC TAGCC TGGG TGCACTC GAAC TGAAT DA ICE DAE AAAATTCCT T AAAAGTTCCTTGAACTGAATGATT GAGCAAGACTCTGTCTCRAAAAAAAAAAAAAAAAAA AAACTGAATGAT T TGACAGAGCAAGACTCTGTCTCCAAAAAAAAAAAAA ATTCCTTGAACTGAATGATT CTAGCCGGGGTGACAGAGCAAGACTCTGTCTCCAAA AAAAAATTCCTTGAACTGAATGA T TJ GCACTCTAGCC TGGGTGACAGAGCAAGACTCTGTCT AAAAAAATTCCTTGAACTGAATGAT TJ TGCACTCTAGCC TG AGAGCAAGACTCTGTCTCARAAAAAAAAAAAAAAAAA ATTAAACTGAATGATT TG GAGCCGGGGTGACAGAGCAAGACTCTGTCTCCAAAA AAAAAAATTCCTTGAACTGAATGATT CTCTAGCCTGGGTGACAGAGCAAGACTCTGTCTCCA AAAAAAAAAATTCCTTGAACTGAA TGATT The light blue bar GGACTCTAGCCTGGGTGACAGAGCAAGACTCTGTCT AAAAAAAAAAAAATTCCTTGAACTGAATAATT TGCEC TC TAGCC TGGG TGACAGAGCAAGACTCT AAAAAAAAAAAAAATTCCTTAAACTGAATAATT will hiehlieht the GCAGTCTAGCCTGGGTGACAGAGCAAGACTCT AAAAAAAAAAAAAAATTCCTTGAACTGAATGAT TI 5115 TGCAC TC TAGCC TGGTTGACAG CAAAAAAAAAAAAAAAAAATTCCTTGAACTGAA TG SNP iti TGCA
99. he job is finished extract the unmapped reads to align the reads in shorter or longer range of the two mates in one pair Here you will find those candidates of insertion and deletion You can also map these unmapped reads in single end mode to find some translocations 4 10 System Configuration There are five types of configuration which will help your ZOOM run more smoothly Default storing directory If you are used to using a directory to store your jobs you can click Browse button to set the directory you preferred as the default storing directory of new created ZOOM jobs The size of split files See Section 2 6 Reads file suffix ZOOM can automatically load read files in the directories you selected by recognizing suffixes of the files By default fasta fastq csfasta fq seq txt and _prb txt will be loaded If you need more patterns click the dropdown list and choose pattern as follows Split my job into tasks each has at most 4000000 reads directories will be loaded automatically responding gt prb txt file as the sequencing quality file id R3 csfasta are pair end mate pair read files prb txt add pattern 77 A dialog will pop up Z input X Files with name are reads files Enter the new suffix in the text field and press OK Quality score file suffix When you select reads file its corresponding quality score
100. he right side of the toolbar to launch the configuration dialog 2 Input 92 166 1 5 in the address box and 20001 in the port box then click the button the user may wish to remove the existing servers first The new ZOOM server appears in the list but is deactivated the left because it has not been launched computer B yet 4 Configuration Servers System Configuration 127 0 0 1 20000 192 158 1 5 20001 Address 132 168 1 5 Port 20001 Add server 2 The local ZOOM server is online Stop the local ZOOM server Close the dialog 16 Configuration of the server side Computer Important each copy of ZOOM server requires its own directory to run and multiple servers should NEVER be launched within same directory Computer B can have either a Windows or LINUX platform and the users should choose the appropriate distribution of server binary file for their system If B 1s Windows platform the ZOOM server file is called zoomsrv exe together with supporting pthreadGC2 dll and mingwm10 dll and if B has LINUX platform the ZOOM server file is called ZOOM Copy proper ZOOM server file in the ZOOM package to computer B For Windows platform On computer create a new directory and transfer zoomsrv exe pthreadGC2 dll mingwm10 dll start_server bat into it You should always create new directory for each copy of ZOOM server 2 Use file editor such as Notepad to open star
101. in as the reads file 2 To remove the files you don t want select the files and click the 4 Remove fromlist button The files will be removed 3 To add more files click 4 Click the Add read file dir s to list again as step 1 i button to move on to the reference sequence selection window Input paired end reads or mate pair reads By default the two mates of one pair are stored in two files separately but you are allowed to store the two mates of one pair next to each other in one file However in that case please use single end reads mode to load reads and assign mapping in paired end mode in the Mapping Parameters section l Click the Paned files mede gt gt button to change to paired end reads input mode The read list window is split into two windows each of which is for loading the file containing one mate of a pair 7 Create new job forward read file quality file reverse read file quality file tnp yead_1 fastg impyead_2 fast Steps 1 Basic information 2 Input reads 3 Reference sequences 4 Mapping parameters Auto find pair read files into list Remove pairs from list 2 Click the duin nd poi rend button and select the read files or directories ZOOM will automatically pair these read files according to the file names The paired file will be on the same row of the two windows ZOOM recognizes the read file pairs according t
102. ing data ZOOM won t respond until the progress bar has disappeared Z ZOOM next generation sequencing 21 24 File Control Tool Help DROO wi WE d w msolexa_single_end_test Processing 2 building database 25 completed Running monitor job Itotal tasks 0 overall progress 0 Job Properties Subtask Subprogr Time Job View panel After loading the data you will see the job in the Job View panel BSolexa_single_end_test The Job View panel which is shown in the upper E Scheduling left hand corner displays the organization of a A UE UR ee particular job Use the and boxes to expand and collapse the job in order to know the organization of this job In each job node there is a G Results 29 Scheduling node and a Results node The Scheduling node shows all the tasks this job has been split to and scheduled on the server The Results node will not appear until all reads mapping tasks are finished It will contain the uniquely mapped results suffixed by UNIQUE and the top N mapping results suffixed by ALL according to the running parameters Running status of the job Clicking on the job node the Running Monitor will show the progress of the job type job total tasks 1 Click the button to dis
103. is the difference between the read Each black block 15 the hint of the quality score on this position The higher the block 15 the largerthe quality score of this position 15 and the reference sequence segment Note that the direction of the alignment 1s according to the original direction of the read sequence The start offset and the end offset of the alignment is marked When the left offset 1s less than the right offset the read is mapped to the reverse chain of the reference sequence The difference between the reference sequence and the read sequence will be marked as red For ABI 5011 reads both the color space read sequence with the adapter and the nucleotide read sequence decoded from the mapping results by ZOOM will be shown in the following image Both the sequencing errors on the color space reads and the differences between the decoded reads and the reference sequence will be marked as red e information Read name 6638 R3 Reference offset 6325 Mapping direction Positive reference 6925 CCAAACAACACGCCGCCGAAGGACTGCGACAGCCAGGCAAATTTAAGTAA 6974 decoded read CAAACAACACGCCGCCGAAGGACT GCGACAGCCAGGCAAATT T AAaTAA color space read 2 35 VT TD DOT QUU SO sequence quality _ _ ___ Clicking the read sequence button
104. ith its target region the reference sequence there are at most eight mismatches between the color space of this read and its target region For example for reads of length 35 bp ZOOM will find all the mapping results which have Three polymorphisms on the base space in total six mismatches between the color space of reads and the reference sequence Two polymorphisms on the base space and one sequencing error on the color space in total five mismatches between the color space of reads and the reference sequence One polymorphism on the base space and two sequencing errors on the color space in total four mismatches between the color space of reads and the reference sequence Zero polymorphisms on the base space and three sequencing errors on the color space in total three mismatches between the color space of reads and the reference sequence Read Length range bp 24 244 25 255 30 242 42 228 118 9 Frequently Asked Questions Question Can I put reads of different lengths in the same file Answer Yes ZOOM will automatically call different parameter sets for different read lengths and the results will be merged Question Is the input read data case sensitive Answer No g r id U and all other letters are N If you have different requirements please contact us Question Can I get all mapped positions for each read in addition
105. job to inspect the running status or analyze the results at any time 70 4 7 Orienting Yourself Job View Panel This frame appears in the upper left hand corner of the ZOOM main window displaying the organization of particular jobs if applicable You can control these jobs and inspect the running status of the jobs in this panel job 2 The jobs are organized as a tree Each job 15 a job node Er Scheduling having two nodes One is a Scheduling node and the 0088 TASK 2009 08 28 15 22 0220 other is a Results node Use the and boxes to E Results job 2 UNIQLE expand and collapse each job 3 Scheduling After a job 15 created ZOOM will automatically create a Results TASK node under Scheduling to map these reads el job 3 UNIQUE After the job is finished the Results node will appear containing one or two results nodes to show the mapping results and carry out the post analysis If the amount of reads is very large ZOOM will automatically partition the reads into several parts and launch several tasks for each part of the reads ZOOM will schedule these tasks automatically until all reads are handled 71 qi R zoom File Control Tool Help U MABIRosalind pair m2 speed Scheduling 1 i d nad oen ERE ANB AA TASK 2009 0
106. lease refer to Chapter 8 Note that this option might be time consuming when your reference sequences are long One way to enhance the speed is to map reads without selecting this option first then extract those unmapped reads to utilize this option Please refer to Section 4 7 for more information Collecting Results Each read may be mapped to multiple target regions in the reference sequence The best mapping results of one read are the ones with the smallest edit distance or in case of equal edit distance the shortest insertion deletion length under the consideration that insertion deletions are less probable than mutations If there 1s only one such best mapping result for the read this is a uniquely mapped read Otherwise if there are multiple such mappings the read will be considered ambiguously mapped ZOOM finds all possible mapping positions satisfying the mapping criteria for each read However you can choose to reserve the uniquely mapped reads or the top N mapping results for each read Choose the following radio box to switch between the two modes top 2 best mapping position You can utilize quality scores to assess the mapping probability of possible mapping results of each read and rank the mapping positions for each read according to the mapping probability scores by clicking the following checkbox 69 5 mapping records for each read assuming that the SNP probability is 0 01 Note that only if top N map
107. m m m T dur m eem em um quum pum qum qum qum qum T ee ll I ir SSS SS SSS SS SS SSS qum qm m qm um Yoon T SSS SS SS pum pum pum colec PY OA OS OA OA OA OS OS mcm UC vll ee Tm mime gummi cmm mimic mmc v mo D PAD UJ DS P CS UJ PA AOS SD qT u SSS uu u mrmr CJ DS bb mE ee el mc ess m um m uum 3 0 0 0 8 8 1150 B iGGACTCCGGGAACCA TTGCACTGCGCCCAGCCAGACAGCGGCGGTTGAGCCGACAA TAGCGC
108. mapped results will be show in a Mapping Summary tab window beside the read information tab MEUM 8017 reads are uniquely mapped to the reference sequences The detailed statistics of mapping is as following Insertion deletion length 39 37 Finding SNP Candidates We suggest that users find SNP candidates only using the uniquely mapped reads i e using the UNIQUE result node other than ALL result node Because result node contains top N mapping results for each read those reads mapped to multiple positions of the reference sequence will make the SNP finding process unreliable 1 Click the Solexa single test UNIQUE result node and click the filter SNP F candidates toolbar icon or Select SNP Filter from the Tools menu File Control Tool Help 5olexa single end test n jr Filtering criteria There are five filtering criteria Atleast 3 reads mapped to this position read depth which you can apply Atmost 2000 reads are mapped to this position for the SNP finding Discard the reads whose sum of base quality is less than 20 For a detailed Average quality score on the SNP position of all reads covering this position explanation of each shouldbe larger than 10 criterion please refer Average quality score of the 5 bases on each side of the SNP should be greater than 10 to Section 6 1 in th
109. ment Click the launch license wizard button to continue Then follow the instructions listed above for registering ZOOM Studio 2 6 Set up your working environment The ZOOM Studio works in client server mode The following graphical user interface called ZOOM GUI is the main work space for you to load your data submit them to server s for computational tasks monitor the working progress and view the analysis results 7 ZOOM next generation sequencing File Control Tool Help noo ali SN C 2 3 Scheduling Results E job 3 UNIQUE job 3 ALL c E X Mapping Result job 3 UNIQUE TATAAAACCAGTATCACCCTAATACCAAAACCAGGA TCTATGAAACCAGTATCACCCTAATACCAAAACCAG AAATCATTCTATGAAACCAGTATCACCCTAATACCA GACAT AATCCCCTCTAAATCATTCTATGAAACCA AATACCAAAACCAGGAGAGGACATAACAAAAAAAA AATCCCCTCTAAATCATTCTATGAAACCA CTAATACCAAAACCAGGAGAGGACATAACAAAAAAA AACCCCCTCTAAATCATT AACAGTATCACCC TAA TACCAAAACCAGGAGAGGAC AAAAAAAAACAAAAC AATCCCCTCTAAATCATT AACCAGTATCACCCTAATACCAAAACCAGGAGAGGA AAAAAAAAAACAAAAC TCTAAA CCAGGAG AAAAAAAAAACAAAAC GAAACC CAGGA ATAAAAAAAAAAAAACAAAAC TCTATGAAACCAGTATCACCCTAATACCAAAA GGABRATAARAAAAAAAAAACAAAAC TCATTCTATGAAACCAGTATCACCCTAATACCAAAA ACA TGAAACCAGTATCACCCTAATACC AABAAAAAAAAAACAAAAC AAABAAAAAAAAAACAAAAC AACAAAAAAAAAACAAAAC CCCATCTAAA C AACCCCCTCTAAATCATTCTATGAAACCAGTATCA ACCAAAAC GAGG AATCCCCTCTA
110. nce sequences Choose the reference fa file in the Sample_Data SOLiD mate pair directory Click to move on to the Mapping Parameters 5 The estimated range of the distance between two reads of one mate pair 1s 800 2000 Set the paired end parameters as follows Pair end Settings Input reads are paired end reads mate pair reads The distance between two locations of paired reads is 800 bases to 200 bases Keep the top two mapping results for each read Collecting Results For each read report the unique best mapping position top 2 best mapping position Click Finish to create the ABI mate pair test job 6 After the job is finished click the mate pair test UNIQUE result node and click the al toolbar icon to show the mapping results 7 Click any place in the Mapping Results Illustrating window select a read and press Find its mate pair button ZOOM will then jump to the pair of the selected read 52 5 Load Data 4 Data Format Before loading any data files into ZOOM please make sure that the data is in an acceptable format ZOOM accepts reads from both Illumina Solexa data and ABI 5011 color space data ZOOM can handle data files in the following formats 4 1 Illumina data ZOOM accepts five types of IlIlumina Solexa read files as input These file formats automatically recognized The let
111. ncing quality at that position in the read Therefore mismatches occurring at positions with low quality scores are more likely due to sequencing error Thus mismatches on positions with high quality scores are more meaningful than those at low quality score positions Use the following combo box or enter some value to assign the threshold of high quality score Don t count mismatches on base with quality score less than count all bases Ignore reads with less than 8 high quality bases ss 31 E Mapping Criteria check this to consider read length into mapping criteria The second way is to utilize quality scores to rank several possible mapping results of each read and choose the best position as the mapping results Since the process happens after all mapping positions have been found this way will be described in part 5 Collecting Results In our experiments we observed that more reads were uniquely mapped when quality scores were included to help mapping Mapping Criteria Mismatches and insertion deletion The following is an example that a read 1s mapped to the reference sequence with only mismatches The four red characters in the alignment are the mismatches between the read and its target region on the reference sequence reference 6860 GI ATTTAGCT GACTCGCCACACT CCACGGAAGCAAT 6895 read GI ATTTAGCTGI CTCGCCACCCT CCACOGGI AGCACT Sequence quality SH HNHNHN mEH Em
112. nd 10 shall survive any termination of this Agreement 7 Export Controls The Software is subject at all times to all applicable export control laws and regulations in force from time to time You agree to comply strictly with all such laws and regulations and acknowledge that you have the responsibility to obtain all necessary licenses to export re export or import as may be required 8 Assignment Customer may assign Customer s rights under this Agreement to another party if the other party agrees to accept the terms of this Agreement and Customer either transfer all copies of the Program and the Documentation whether in printed or machine readable form including the original to the other party or Customer destroy any copies not transferred Before such a transfer Customer must deliver a hard copy of this Agreement to the recipient 9 Maintenance and Support BSI will provide technical support for a period of thirty 30 days from the date the Software is shipped to Licensee Further maintenance and support is available to subscribers of BSI s Maintenance plan at BSI s then current rates Technical support 15 available by phone fax and email between the hours of 9 am and 5 pm Eastern Time excluding statutory holidays 10 Governing Law This Agreement shall be governed by and construed in accordance with the laws in force in the Province of Ontario and the laws of Canada applicable therein without giving effect to conflict of law pro
113. ng errors on color space and true polymorphisms to their target region on the reference genome marked respectively Terminology and Abbreviations Glossary Base space reads represented in the alphabet of nucleotides A C G T N such as ACGTAAA BSI Bioinformatics Solutions Inc the maker of PEAKS PatternHunter RAPTOR ZOOM and other fine bioinformatics software Color space also called di base alphabet This 15 the data format produced by the ABI SOLiD sequencer Reads are represented as colors in way that two adjacent nucleotides are encoded by one color letter represented as 0 1 2 3 The convert from base space to color space uses the following table A A 0 1 G 2 T 3 Coverage the number of reads that one segment area of the reference sequence 1s sequenced It also means the number of reads mapped back to one position or one area of the reference sequence NW C HP NWA Edit distance the summation of the number of mismatches and the lengths of indels Hamming distance the number of mismatches between a read and its target region on the reference sequence Indel insertion and deletion mutations Mismatch mismatch occurs when the nucleotide base from the read and the reference sequence are different or when either of the sequences has an N at that position If the sequencing qualities are also used the mismatches occurring at low quality sites determined by a qu
114. nt to export from the Job View Panel Select Export from the File menu Select Mapping Results from the popup menu There are four output formats to export mapping results into ZOOM next generation sequencing Control Tool Help NewJob Load Job Ctrl O Remove lob Ctrl R Export b Mapping Results ZOOM 906 8 Exit Ctrl Q Consensus sequences BED WIG GFF 1 Select the output directory and output filename from the popup browse window ZOOM will output mapping results of the selected results node to the output file If a UNIQUE Results node 15 selected the content of the output file is the mapping results of uniquely mapped reads If an ALL Results node 15 selected the content of the output file 1s the top N mapping 102 results of each read Note that the mapping results are sorted by the offset of the reference sequences in ascending order Thus the top N results of one result might not be listed one by one The two mates of one pair are not listed one by one either ZOOM format Output for Illumina Solexa reads By default ZOOM will output the mapping results of each mapped read in the selected Results node Each line of the file corresponds to a mapped position which contains six basic fields delimited by tabs as follows lt Read label gt lt Reference name gt lt Reference offset gt lt Strand gt lt Mismatch number gt lt Insertion
115. o the following file name patterns 62 o filename l fastq and lt filename gt _2 fastq o filename F3 fastq and filename R3 fastq If you choose a directory ZOOM will automatically pair the files satisfying the naming rule If you want ZOOM to pair the reads for you please make sure the suffixes are correct Otherwise you will need to upload the reads file pair by pair by yourself as follows o Double click the left forward read file window and select one file filel o Double click the right reverse read file window and select the other file lt file2 gt MAKE SURE that lt file2 gt contains the mates of the reads in lt filel gt since the two files will be in the same row in the two windows and be treated as read pairs KEEP IN MIND that reads file in one row are paired When you select one file the two files in the row are selected as follows forward read file quality file reverse read file quality file C ZoomStudio 1 Data SOLiD zoomsStudio 1 3 5 C FoomStudio1 3 Sample_Data Solex C ZoomStudio 1 3 Sample_Data SOLiD C FoomStudio1 3 Sample_Data Solem Remove pars trom 3 remove read pair files select the files and click the delete the files 4 add more read pair files click 4 Auto find pair read files into lst double click the two windows as above S Click the button
116. ob from the File menu The Create a new job window will open as follows Create new job Job Label job 1 Steps Job Location F ZOOMDB aaa E Job Folder F ZOOMDBIjob 1 2 rea Notes Description 3 Reference sequences Created on Tue Aug 25 10 02 49 EDT 2009 4 Mapping parameters 59 There are four steps to create a job Basic information This section is used to assign a name for your job and a directory to store the data related to your job After you have finished the job you can load the job to show the mapping results or perform post analysis by selecting the directory 7 Create new job Job Label job 1 Steps Job Location F ZOOMDB 1 Basic information Job Folder F ZOOMDB job 1 2 Input reads Motes Description 3 Reference sequences Created on Tue Aug 25 10 09 18 EDT 2009 4 Mapping parameters l Enter a name for your job in the blank field beside the Job Label Job Label job 1 If the job name you input already exists in the directory there will be a tool tip icon below the Browse button as follows The specified folder already exists Its content will be overwritten if used You can choose to change the job name or to overwrite the existing directory 2 Press the Browse button Select a directory to save your job 3 You can enter any descriptions about your job for future reference Notes Description Created on Tue Aug 25 10 09 18 EDT 2
117. ognizing paired end files as described in Section 4 10 Otherwise you will need to feed in the reads files one pair by one pair by yourself as follows o Double click the left forward read file window and select the Sample DataMsolexaWair endWead 1 fastq forward read file quality file reverse read file MoomsStudio1 3X5ample Data SOLiD C oomStudio 1 315 C ZoomStudio 1 3 5ample Data amp OLiD ZoomStudio1 3 Sample_Data Solex Double click the right reverse read file window and select the Sample_Data Solexa pair end read_2 fastq forward read file quality file reverse read file quality file C ZoomStudio1 3 Sample_Data SOLib C ZoomStudio1 3 s C ZoomStudio1 3 Sample_Data SOLiD C ZoomStudi C ZoomStudio1 3 Sample_Data Solex Pe ZoomStudio1 3 Sample_Data Solex PLEASE KEEP IN MIND that two reads files in one row are paired When you select one file the two files 1n the row are both selected as follows 51 forward read file quality file reverse read file quality file C ZoomStudio 1 3 Sample_Data SOLiD C ZoomStudio 1 315 C ZoomStudio1 3 Sample_Data SOLiD C ZoomStudi C FoomStudio1 3 Sample_Data Solex C FoomStudio1 3 Sample_Data Solex Remove pairs from list o Select the Solexa data and click button to delete the file We will not be using this set of data in the following tour 4 i to move on to the Refere
118. osition consensus base the nucleotide of the consensus sequence the position ZOOM can build the consensus sequence in a haploid genome or a diploid genome If the organism 15 viewed as a diploid the nucleotide on the consensus sequence follows the IUPAC code which uses one letter to denote a haplotype Say S denotes the haplotype lt G C gt while R means lt G A gt Read Depth the amount of mapped reads covering the position best base the nucleotide with the largest amount on the SNP position bestBaseCount the amount of the best nucleotide on the SNP position 2nd best base the nucleotide with the second largest amount on the SNP position 2nd BestBaseCount the amount of the second best nucleotide on the SNP position is a gap When it appears in ref base there is an insertion When there is in the consensus base a deletion has occurred 100 101 7 Export ZOOM can export the mapping results the consensus sequences built and SNP candidates into files Several commonly used formats are supported to help users to exchange data or get more information in the UCSC genome browser such as checking whether the alignments fall in exon regions The mapping results can be exported into ZOOM format BED format GFF format and WIG format The consensus sequence can be exported into FASTA format 71 Export Mapping Results Select the Results node of job that you wa
119. our institution Click Next 3 The following window will appear 11 license wizard Click the button below ta save the request file to a USB key a floppy drive 5ave Request File Copy the request file license request to another computer P C2 which has Internet connections The next step will be performed on 2 Select the Save Request File button to save license request to your computer PCI Click Next 4 Transfer the license request file from onto a computer with an Internet connection PC2 using a USB key a removable storage device On PC2 go to http www bioinfor com 1cs20 Product Licensing Center Windows Explorer 5 bieinformatics solutions 6 Click the Browse button to select the license request file type in the visual verification code and click Next BSI Product Licems Select I have the license request file I want to register the software and click Next The following window will appear ing Center Windows Internet Explorer tor wre condire pp Favorhes Heb p SL Product aning carter re xts Downloads Support User Comments bioinformatics solutions 12 7 After the license email 1s received on PC2 save the attachment license lcs as is and copy the file to PCI If you do not receive the license lcs file in
120. ping results are chosen this option 15 available ZOOM adopts a re score scheme First it finds all possible mapping positions of each read within the mismatch threshold or edit distance threshold Then it picks the multiple positions 5 best mapping position 5 mapping records for each read the addition of B and extra to compute the probability of the alignments between the reads and these target regions The mapping scores log10 probability of the alignment are sorted to get the top N results as mapping results In our experiments more reads can be uniquely mapped after re scoring using quality scores The rescore function for the ABI SOLID data has used the priori SNP probability of the organism therefore if the data set is ABI 5011 data please assign a priori SNP probability of the organism For example for Human this value can be set as 0 001 assuming that the SNP probability is 0 001 4 6 Open a Job l To open an existing job select Load Job from the file menu or using the Load job icon m on the toolbar 2 Select the location of the data for that job 3 The job will be loaded in the Job View Panel You can continue to rerun the job or analyze mapping results for those finished jobs This option is useful when you want to close the client windows After jobs are created you can close the ZOOM main windows The jobs are running on the ZOOM server You can open the ZOOM main windows and load the
121. ping results will be LeadJob Mapping exported in puce format Note that we BB Remove job suggest only building consensus sequence ae ete pping Resu the UNIQUE result node based on similar E reason for SNP finding s II 3 9 Change parameters to get more mapping results For the unmapped reads of this job adjusting parameters such as the reference sequences mismatch number allowed between reads and reference sequences may achieve more mapping results l Click the Solexa single end job node and click the reprocess unmapped reads toolbar icon 44 7 ZOOM next generation sequencing a File Control Tool Help aig QO 5olexa single end test Scheduling E Results n Z single end test UNIQUE The following windows will pop up Job Label 5olexa single end test more Job Location F ZOOMDB Browse Job Folder F OOMDB Solexa single end steps 1 Basic information 2 Reference sequences Notes Description 3 Mapping parameters Created on Wed Aug 26 17 31 21 EDT 2009 This 18 of reprocessing unmapped reads in the lsolexa single end test job with different parameters The process is similar to creating a new job except that the reads data 1s the unmapped reads of the selected job Assign a name to the new job for these unmapped reads The de
122. play the overall progress 100 properties of this job including the read files and the reference files using parameters and mnemonic notes Subprogress Time O 00 00 03 Click on a task node The progress of task will be shown tipe task remote map total subtasks 3 overall progress 62 Job Properties Control the job If you want to cancel or restart a job or several jobs choose the corresponding job nodes and then click the gt tool bar icon the 7 toolbar icon 30 3 6 Display Mapping Results When the job icon turns into the job is finished You can show mapping results or carry out SNP analysis now Make sure that you select the node under the Results node when choosing data to be analyzed 1 Select the UNIQUE node in the Results node on the Job View panel and click the Display mapping result toolbar icon 2 ZOOM next generation sequencing File Control Tool Help Em vV air i i ik SNP 4 5olexa single end test AT Scheduling _ single_end_test UNIQUE 2 ZOOM will assemble the mapped reads into a consensus sequence and show the read depth overview along the reference sequence This will take some time depending on the amount of mapped reads and the length of the reference sequence A progress bar will pop up 7 loading data assembly reads 29 compl
123. ple_Data SOLiD single_end yead_QV qual ZOOM recognizes the corresponding quality file by the file names so please make sure that the read sequence file is in the same directory with the quality score file and the prefixes of the file names are same For IIlumina Solexa data the lt filename gt _seq txt will be matched with lt filename gt _qual txt For ABI SOLID data the lt filename gt csfasta will be matched with lt filename gt _QV qual The quality score in the FASTQ format will be loaded directly 3 By selecting the files you don t want and clicking Remove you can remove these files Select the read csfasta file in the read file list as following and click the C V oomsStudio 1 3X5ample Datal amp olexalsingle end C VoomStudio1 3X5ample DataY amp OLiD single C foomStudio1 3 Sample_Data SOLiD single_end pead_OV qual This file will then be removed from the read file list 26 Z Create a new job read file quality file 1 Data amp olexa single end Steps 1 Basic information 2 Input reads 3 Reference sequences 4 Mapping parameters Add read file dir s to list Remove from list Switch to paired fles mode 4 Click the Next button on the bottom of window to continue Reference sequences Assign the reference sequences where the reads data are mapped to Add file dir s to list
124. position 2nd best base the nucleotide with the second largest amount on the SNP position pepe cs er 2nd BestBaseCount the amount of the second best nucleotide on the SNP position 97 718 gap When it appears in the ref base then there is an insertion When is in the consensus base a deletion occured Operations on the Table Double click a row the table Then the cursor in the mapping results illustrating window will jump to the position of this SNP The column of the position will be highlighted by a blue background CTCG CICG CTCG CICG CTACATTTTTTTTTTTTITTITTTTTTTTTITIT CTC GCTACATTTTTTTTTTTTTTTTTTTTTTTTTIT GGCTACATTTTTTTTTTTTTTTTTTTTTTTTTIT GGCTARATTTTTTTTTTTTTTTTTTTTTTTTTTT GGCTACATTTTTTTTTTTTTTTTTTTTTTTTTIT TGGCTACATTTTTTTTTTTTTTTTTTTTTTTTTTIT TGGCTACATTTTTTTTTTTTTTTTTTTTTTTTTIT TGGCTACATTCTTTTTTTTTTTTTTTTTTTTTTTT TGGCTACATTTTTTTTTTTTTTTTTTTTTTTTTIT CTGGCTACATTTTTTTTTTTTTTTTTTTTTTTTTIT CTGGCTACATTTTTTTTTTTTTTTTTTTTTTTTTIT CCTGGCTACATTTTTTTTTTTTTTTTTTTTTTTTIT CCTGGCTACATTTTTTTTTTTTTTTTTTTTTTTTIT GCCTGGCTACATTTTTTTTTTTTTTTTTTTTTTTIT GCCTGGCTACATTTTTTTTTTTTTTTTTTTTTTTIT ae GTAGGGACGGGC TTTCCCCGTGTTGTCCETGCTGG GIAGGGACGGGCTTTCCCCGTGTTGTCCAGGCTGG GTAGGGACGGGCTTTCCCCGTGTTGTC
125. r to Section 6 2 in Chapter 6 in this manual for detailed information 5 2 Show Mapping Results Summary Click a job node and click the mapping results summary toolbar icon 7 ZOOM next generation sequencing File Control Tool Help SNP goolexa single test G Scheduling Results pu L E A iry information The summary of the mapping results will be shown in the pop up window including the total number of reads in the read data files number of reference sequences the length of the reference sequences 89 Mapping Results Summary There are 10000 reads in the read data files The number of the reference sequences is 1 There are 11100 bases in the reference sequences Click on the Unique Mapping Results tab will show the number of reads mapped uniquely and the statistics of the uniquely mapping results in the following picture Mapping Results Summary 8017 reads are uniquely mapped to the reference sequences The detailed statistics of mapping is as following Mismatches Insertion deletion length If the mapping results of a job include a ALL Results node there will be one Mapping Results tab showing the number of positions mapped in the top N mapping results Note that the number of the mapped reads 15 in fact the number of mapping positions since one read could be mapped to multiple positions since top
126. robability For ABI SOLID data because ZOOM can differentiate possible genomic differences from sequencing errors on color space ZOOM computes the mapping probability of alignments for each read utilizing both quality scores and the probability of an SNP occurring in the organism you sequenced Then it uses the mapping probability to assess and choose the best or top N mapping results for each read Please refer to Section 4 3 for more information Question Can ZOOM output the SNP candidates or the INDEL variation candidates Answer Yes You can ask ZOOM to carry out the post analysis to find SNP candidates and view them in an intuitive way Question Can ZOOM output the structural variation according to the output of paired end mapping Answer Not yet In this release ZOOM outputs those read pairs mapped in the distance range You should judge whether there 15 structural variation by the mapping offsets and direction of the two mates of one pair ZOOM offers the process unmapped reads function which will map 121 those unmapped reads with a different distance range or map them in single end mode This might help you to identify structural variation Question Are there restrictions on the length of reads label Answer No However please don t use spaces inside the label for the one read per line format because spaces aid in identifying where the read data field begins Question Can ZOOM deal with 454 data or Helicos
127. rsed and the left offset 1s larger than the right offset as shown in the above picture Copy the read sequence 13 Click will be copied to the clipboard of system 14 Click the Solexa_single_end_test job node and click e toolbar icon 7 ZOOM next generation sequencing File Control Tool Help mE goolexa single end test E 5chedulir E Results y information button then the read name and the read sequence A summary of the mapping results Mapping Results summary m will be shown in the pop up window The summary includes the total number of reads in the There are 10000 reads in the read data files read data files the number of The number of the reference sequences is 1l reference sequences and the length There are 11100 bases in the reference sequences of the reference sequences 38 Click on the Unique Mapping Results tab to show the number of reads mapped uniquely the statistics of the uniquely mapping results Mapping Results Summary Unique Mapping Results All Mapping Results 8017 reads are uniquely mapped to the reference sequences The detailed statistics of mapping is as following 15 2 ZOOM next generation sequencing ade File Control Tool Help Solexa_single_end_test Ee r information Solexa single end test UNIQUE The summary of the uniquely
128. sequences and map all the reads to them Then extract the reads unmapped to the known transcriptome and map them to the whole human genome sequence Those reads mapped to the whole human genome might come from novel transcripts Since the mapping allowing insertion deletion detection takes much more time than the mapping allowing only mismatches You can map the reads allowing only zero mismatches and then extract the unmapped reads to map with insertions deletions You even can choose to increase the length of insertion deletion step by step First map reads with insertion deletion length of one then length of two as so forth Finally show all the mapping results together This will save a lot of running time especially when the dataset 15 huge and the reference sequence 15 long 76 ZOOM adopts the multiple spaced seeds strategy which can guarantee to find all the alignments satisfying the mismatches threshold set by you 100 sensitivity However when the reference sequence is long the process will consume more time You can run the mapping without clicking achieve high sensitivity parameter first and then extract those unmapped reads by choosing the achieve high sensitivity parameter later Lastly display the mapping results together If you want to find a long insertion deletion using mate pair library sequencing start by creating a Job to map all the reads in paired end mode given the range of the two mates in one pair After t
129. sfer assign sell or otherwise provide access to the Software in whole or in part on a temporary or permanent basis except as otherwise permitted by this Agreement Licensee may not alter remove or cover proprietary notices in or on the Licensed Software or storage media or use the Licensed Software in any unlawful manner whatsoever 4 Limitation of Warranty THE LICENSED SOFTWARE IS PROVIDED AS IS WITHOUT ANY WARRANTIES OR CONDITIONS OF ANY KIND INCLUDING BUT NOT LIMITED TO WARRANTIES OR CONDITIONS OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE LICENSEE ASSUMES THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THE LICENSED SOFTWARE 5 Limitation of Liability IN NO EVENT WILL LICENSOR OR ITS SUPPLIERS BE LIABLE TO LICENSEE FOR ANY INDIRECT INCIDENTAL SPECIAL OR CONSEQUENTIAL DAMAGES WHATSOEVER EVEN IF THE LICENSOR OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE OR CLAIM OR IT IS FORESEEABLE LICENSOR S MAXIMUM AGGREGATE LIABILITY TO LICENSEE SHALL NOT EXCEED THE AMOUNT PAID BY LICENSEE FOR THE SOFTWARE THE LIMITATIONS OF THIS 124 SECTION SHALL APPLY WHETHER OR NOT THE ALLEGED BREACH OR DEFAULT IS A BREACH OF A FUNDAMENTAL CONDITION OR TERM 6 Termination This Agreement is effective until terminated This Agreement will terminate immediately without notice if you fail to comply with any provision of this Agreement Upon termination you must destroy all copies of the Software Provisions 2 5 6 7 a
130. t_server bat search for corresponding lines and change as follow J Tell the ZOOM server where the ZOOM GUI is set ZOOMGUI 192 168 1 4 4 Specify the port the ZOOM server is going to use set SERVER_PORT 20001 Specify how many cores ZOOM server will use assuming quad core CPU set MAX_CLIENT 4 6 Execute start_server bat to start up the ZOOM server For Linux platform 1 On computer create a new directory and transfer ZOOM start_server sh into it You should always create a new directory for each copy of ZOOM server 2 Use file editor such as vim to open start server sh search for corresponding lines and change as follow 17 3 Tell the ZOOM server where the ZOOM GUl is export ZOOMGUI 192 168 1 4 4 Specify the port the ZOOM server is going to use export SERVER_PORT 20001 5 Specify how many cores the ZOOM server will use assuming a quad core CPU export MAX_CLIENT 4 6 Execute start_server bat to start up the ZOOM server he l icon again to verify that the newly added ZOOM server is activated If the icon turns to then the new server has been correctly launched Back on computer A in the ZOOM GUI click 4 Configuration Servers System Configuration 127 0 0 1 20000 192 168 1 5 20001 The user can repeat these steps to start up more servers on different ports on computer or start up more servers on other computers or even
131. ters of the read sequences are case insensitive The length of the reads can be different FASTA format Example of FASTA format read1A 1 AGGACTATATTGCTCTAATAAATTTGCCGGTTCTTA gt 1 2 TCTAATAAATTTGCCGGTTCTTAAAAACTCAAT read1A 3 54 In ZOOM FASTA format files have no sequencing quality scores thus all the read bases including N are considered equally relevant seq txt and prb txt Files Please put the seq txt and prb txt in the same directory ZOOM will pair the filename seq txt and filename prb txt automatically Example of seq txt file format 1 1 125 701 GCTACCCTTTAGGTTTAA Each line of the sequence file records the channel number tile number x position y position of each sequence read and the sequence of the read The labels of each read sequence are in the format of channel number tile number x position y position Example of prb txt file format 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 The _prb txt file contains the quality score of each possible nucleotide base for the given cycle number Four numbers such as 40 40 40 40 each separated by a space are the sequencing quality scores associated for each possible nucleotide ACGT respectively The tab character is used to separate the bases of each cycle Each line of prb 15 associated with the corresponding line of _seq txt prb txt Files Example file 40 40 40
132. the SNP finding process unreliable 7 ZOOM next generation sequencing gt File Control Tool Help NewJob Qu LeadJob Ctrl 0 ES Removelob ey Export Mapping Results Exit Assembled sequences 5 Consensus Sequences Consensus Segments Consensus sequence in FASTA format The output file contains the assembly of mapped reads along the reference sequence If multiple reference sequences are used multiple consensus sequences will be output in multi FASTA 112 format in one file If there are some bases without any reads mapped to these positions on the consensus sequence are denoted by When the organism is selected to be treated as diploid genome the nucleotides on the positive chain and the reverse chain could be different ZOOM adopts a way similar to MAQ to compute the post probability of each possible genotype and chooses the genotype with maximal probability as the consensus sequence The genotype is coded by the IUPAC code The mapping relationship of the IUPAC code and the genotype is as follows When the organism is selected to be treated as haploid genome the assembly process constructs a consensus sequence using the following major vote process If deletion gt then there is a deletion at this position otherwise the nucleotide with the highest frequency will be chosen If the read coverage is less than lt mincov gt
133. the letter 1s lowercase unreliable base otherwise it 1s uppercase reliable If insertion gt continuous the number of reads which do not agree that there should be an insertion then there is an insertion after this position and the sequence segment with the highest frequency collected from reads will be inserted into the consensus sequence Consensus segments in FASTA format The output file contains the assembly of mapped reads along the reference sequence Since it 15 probable that no reads are mapped to some regions of the reference sequence there are gaps in the assembly ZOOM can export the assembly sequence as several segments separated by gaps of length that you prefer This option 1s quite useful in some applications such as RNA seq 113 After you choose to export the assembly sequence in Consensus segments a window will pop up asking you to enter the minimal gap size ZOOM will split the sequence with gaps larger than the minimal gap size into two segments Each segment will in the following format reference sequence name gt Contig the number of the current segment StartPos lt the offset of the start position of this segment gt the segment sequence Here is an example gt gi 224384768 gb CM000663 1 Homo sapiens chromosome 1 Contig 15 StartPos 11234 TACGTAGCTTGAACAAAAACCTCGATG 114 115 8 100 sensitivity cases his chapter lists all cases where the current release of ZO
134. to the uniquely mapped information Answer Yes set the parameter in the collecting results part as follows to output the top N best mapping results for each read Set N very large if you want all mapping results for each read for each read report the unique best mapping position 9 top 1000 best mapping position Question In which cases can ZOOM achieve 100 sensitivity Answer ZOOM designs a framework to construct the efficient spaced seeds sets which can achieve 100 sensitivity for a large range of read lengths and mismatch numbers All cases in this release are listed in Chapter 8 For cases with more mismatch numbers and cases with insertion deletion ZOOM also has good sensitivity If you do need 100 sensitivity beyond the listed cases please contact us Question How I get better sensitivity using 100 sensitivity seeds without too much running time Answer Run ZOOM without setting the achieve high sensitivity option first Then extract the unmapped reads to run with ZOOM by clicking the achieve high sensitivity option Question Can ZOOM find short indels Answer Yes However ZOOM can only find one gap with any length on the read sequence The speed 15 about five times slower than the mode that only allows mismatches when each indel is allowed Question The quality of 3 end reads is not very good what should I do Answer You can set a threshold between high quality bases and low quality bases
135. ults nodes the UNIQUE and the ALL node because we set amp Scheduling the parameters to collect top two mapping EI Resulls results for each read The uniquely mapped Solexa single end test more UNIQUE result is in UNIQUE result node while the top Solexa single end test more ALL two mapping results are in the ALL node Click the toolbar icon The job summary window will appear Mapping Results Summary 25 Unique Mapping Results All Mapping Results There are 1983 reads in the read data files The number of the reference sequences is 1 There are 11100 bases in the reference sequences 1983 reads are unmapped in the 5 single end test job There are two summaries for uniquely mapped results and the top two mapped results respectively 7 Click the Unique Mapping Results tab You can see that 1783 reads are mapped after increasing the mismatch number from 2 to 4 between the reads and the reference sequence 46 Unique Mapping Results 1783 reads are uniquely mapped to the reference sequences Ihe detailed statistics of mapping is as following Insertion deletion Read Number Ratio 1068 8 Click the All Mapping Results tab There are 1795 mapping positions in the top two mapping results Note that this is the number of mapping positions rather than the number of mapped reads because one read might be mapped to multiple positions Basic Info Unique M
136. visions and without giving effect to United Nations Convention on contracts for the International Sale of Goods 125 12 Reference ZOOM Paper Please use the following references when publishing a study that involved the usage of ZOOM ZOOM Zillions of Oligos Mapped Hao Lin Zefeng Zhang Michael Q Zhang Bin Ma and Ming Li Bioinformatics 2008 24 21 2431 2437 126 4 1 G CTGACGATGCTGAZ ACGTACATGCAGTQ ACTGACT AGTCCTGACTGACTGCATGACTGCATGAC CAGTCCTGACTGACTGCATGACTGCATG CTGCATGACTGCATGACGTACGTACGACTGT ACGTACGTACGACTGTGACTGACGTACGTAG CTGTGACTGACGTACGTAGCTGACGATGCTGACG ACGTAGCTGACGATGCTGACGTACATGCAGTCCTGA TGCTGACGTACATGCAGTCCTGACTGACTGCATGACTG For technical support issues not found in this manual C ATGACGTACGTACG please contact either your Sales Representative or any CT GTGACTGACGTAG of our support service resources TA CGTACGTAGCTGACGATG GTGACTGACGTACGTAGCTGACGATGCTGACGTACATGCAG ACGTAGCTGACGATGCTGACGTACATGCAGTCCTGACTGACTGCA aid eg CCTGACTGACTGCATGACTGCATGACG Wish eren Vel aic dr dr s TACGTACGACTGTGACTGACGTACGTAGCTGACGATGCTGACGTAG ACTGACGTACGTAGCTGACGATGCTGACGTACATGCAGTCCTGACT TGACGATGCTGACGTACATGCAGTCCTGACTGACTGCATGACTGCA TACATGCAGTCCTGACTGACTGCATGACTGCATGACGTACGTACGAC CTGACTGCATGACTGCATGACGTACGTACGACTGTGACTGACGTACG
137. y have an older version of ZOOM installed on your system please uninstall it before proceeding 8 1 Close all programs that are currently running 2 Insert the ZOOM disc into the CD ROM DVD ROM drive If loading ZOOM via the download link skip to step 4 after downloading and running the file 3 Auto run should automatically load the installation software If it does not find the CD ROM drive and open it to ZOOM Installation access the disc Click on the autorun exe On Linux system BEA click ZOOMsetup bin SPS Brochure 51 3 Introduction B51 wur quide vou through me instaltamon of ZCUOM Stadig 1 3 It i5 Sinn gig nhc crime ded thad quit programs nesane with insTall t n Clicking Mast button ba proceed ia ihe next sereen If yau want te change someting on a previous screen Click the Previous button 4 A menu screen will appear Select the top item ZOOM Installation The installation utility will begin the install Wait while it does so When rou mer canoel this msbaltalinn al arm Timo by clicking Bur J Button the ZOOM Studio installation dialogue appears click the Next button 5 Basic system requirements will be presented Click Next 6 Read the license agreement If you agree with it change the radio button at the bottom to select I accept the terms of the License Agreement and click Next il Choose
138. you don t need a particular job anymore you can choose to delete the job from the computer Click the check box as follows and press OK button to delete all the information about the job also permanently delete their files from my disk 49 Extract unmapped reads to create a new job If you want to survey the unmapped reads of a job you can choose to map these reads to other reference sequences or map these reads to the same reference sequence with different parameters for instance to allow more mismatches or edit distances l Select a job 2 Press reprocess unmapped reads toolbar icon The following window will pop up 75 7 Create new job XD Job Label ABI mate pair test more steps Job Location F XZOOMDB z n Job Folder F OOMDBMBI mate pair test more 2 Input reads Notes Description 3 Reference sequences Created on Thu Sep 17 17 08 18 EDT 2009 m The process is similar to creating a new job whose reads data is the unmapped reads of the selected job Assign a name to the new job for these unmapped reads The default job name is the name of the selected job suffixed by a more By pressing Next you can assign proper reference sequences and parameters There are several examples of why you might want to use this option You want to find some novel transcripts from the read data you sequenced First you can create a job to use the known transcriptome sequence as the reference

Next Generation Sequencing So ware User`s Manual Version 1.5

Contents

Download Pdf Manuals

Related Search

Related Contents