Home

CLC Sequence Viewer

1. lt lt 104 9 7 7 Sequence ListS amplas ade sd dd AAA E ee E SR 105 10 Data download 110 1041 COM s s soa kom oR da VARANDAS ERES E 110 11 General sequence analyses 114 11 1 Shuffle sequence 00 00 ee ee 114 11 2 Sequence statistics eae ctas be ee aw Ro a a a 116 11 3 Join sequences 2464 6 wee Owe eses A a ja 122 12 Nucleotide analyses 124 12 1 COnvenm DNA tO RNA 2 ue eee ee eR Re EES ERR ES Ree Se Ee ee DS 124 12 2 Convert RNA to DNA cee bd ee eo we Se RE ee ee E EE Ed ee 125 12 3 Reverse complements of sequences 126 12 4 Translation of DNA or RNA TO protein lt lt 127 12 9 Find OPERA EGO ames wae ee a ee A A E 128 13 Restriction site analyses 131 13 1 Dynamic restriction sites wx aw a oe a A aa A A 131 13 2 Restriction site analysis from the Toolbox 135 13 3 Restriction enzyme ISS hee ee EEE DE AAA 140 14 Sequence alignment 143 E Create NAIL casa AAA 143 a MMC gt pci ee ER ee ee ee eee ee 146 to el ETI we oo Se eo ee ee eo eee eo Eee oe 148 14 4 Bioinformatics explained Multiple alignments 2 0882 ae 150 15 Phylogenetic trees 152 15 1 Inferring phylogenetic trees 0 152 15 2 Bioinformatics explained phylogenetics a eee 155 CONTENTS IV Appendix A More features B Graph preferences C Workin
2. The neighbor joining method builds a tree where the evolutionary rates are free to differ in different lineages CLC Sequence Viewer always draws trees with roots for practical reasons but with the neighbor joining method no particular biological hypothesis is postulated by the placement of the root Figure 15 3 shows the difference between the two methods e To evaluate the reliability of the inferred trees CLC Sequence Viewer allows the option of doing a bootstrap analysis A bootstrap value will be attached to each branch and this value is a measure of the confidence in this branch The number of replicates in the bootstrap analysis can be adjusted in the wizard The default value is 100 For a more detailed explanation see Bioinformatics explained in section 15 2 CHAPTER 15 PHYLOGENETIC TREES 154 Arabidopsis thaliana Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe 100 Mus musculus Bos taurus Homo sapiens of Mus musculus Bos taurus Homo sapiens Saccharomyces cerevisiae Schizosaccharomyces pombe Arabidopsis thaliana Arabidopsis thaliana Figure 15 3 Method choices for phylogenetic inference The top shows a tree found by neighbor joining while the bottom shows a tree found by UPGMA The latter method assumes that the evolution occurs at a constant rate in different lineages 15 1 2 Tree View Preferences The Tree View preferences are these e Text format Changes the t
3. Open a selection in a new view A selection can be opened in a new view and saved as a new sequence right click the selection Open selection in New View L This opens the annotated part of the sequence in a new view The new sequence can be saved by dragging the tab of the sequence view into the Navigation Area The process described above is also the way to manually translate coding parts of sequences CDS into protein You simply translate the new sequence into protein This is done by right click the tab of the new sequence Toolbox Nucleotide Analysis A Translate to Protein 2 A selection can also be copied to the clipboard and pasted into another program make a selection Ctrl C 36 C on Mac Note The annotations covering the selection will not be copied A selection of a sequence can be edited as described in the following section 9 1 4 Editing the sequence When you make a selection it can be edited by CHAPTER 9 VIEWING AND EDITING SEQUENCES 96 right click the selection Edit Selection A dialog appears displaying the sequence You can add remove or change the text and click OK The original selected part of the sequence is now replaced by the sequence entered in the dialog This dialog also allows you to paste text into the sequence using Ctrl V V on Mac If you delete the text in the dialog and press OK the selected text on the sequence will also be deleted Another way to de
4. 125 12 3 Reverse complements Of Sequences 00000 126 12 4 Translation of DNA or RNA to protein lt lt lt lt lt 0 lt lt lt 1 127 12 5 Find open reading frames eee ee 128 12 5 1 Open reading frame parameters 128 CLC Sequence Viewer offers different kinds of sequence analyses which only apply to DNA and RNA 12 1 Convert DNA to RNA CLC Sequence Viewer lets you convert a DNA sequence into RNA substituting the T residues Thymine for U residues Urasil select a DNA sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analysis A Convert DNA to RNA 2 or right click a sequence in Navigation Area Toolbox Nucleotide Analysis EA Convert DNA to RNA 2 This opens the dialog displayed in figure 12 1 If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish Note You can select multiple DNA sequences and sequence lists at a time If the sequence list contains RNA sequences as well they will not be converted 124 CHAPTER 12 NUCLEOTIDE ANALYSES 125 a EB Convert DNA to RNA ba 1 Select DNA sequences Sel
5. Set root above node defines the root of the tree to be just above the selected node Set root at this node defines the root of the tree to be at the selected node Toggle collapse collapses or expands the branches below the node Change label allows you to label or to change the existing label of a node e Change branch label allows you to change the existing label of a branch You can also relocate leaves and branches in a tree or change the length It is possible to modify the text on the unit measurement at the bottom of the tree view by right clicking the text In this way you can specify a unit e g years Note To drag branches of a tree you must first click the node one time and then click the node again and this time hold the mouse button In order to change the representation e Rearrange leaves and branches by Select a leaf or branch Move it up and down Hint The mouse turns into an arrow pointing up and down e Change the length of a branch by Select a leaf or branch Press Ctrl Move left and right Hint The mouse turns into an arrow pointing left and right Alter the preferences in Side Panel for changing the presentation of the tree Note The preferences will not be saved Viewing a tree in different viewers gives you the opportunity to change into different preferences in all of the viewers For example if you select the Annotation Layout species for a node then you will only see the change i
6. e Sort enzymes by number of restriction sites TF This will divide the enzymes into four groups Non cutters Single cutters Double cutters Multiple cutters There is a checkbox for each group which can be used to hide show all the enzymes in a group e e Sort enzymes by overhang T 7 This will divide the enzymes into three groups Blunt Enzymes cutting both strands at the same position 3 Enzymes producing an overhang at the 3 end 5 Enzymes producing an overhang at the 5 end There is a checkbox for each group which can be used to hide show all the enzymes in a group CHAPTER 13 RESTRICTION SITE ANALYSES 134 13 1 2 Manage enzymes The list of restriction enzymes contains per default 20 of the most popular enzymes but you can easily modify this list and add more enzymes by clicking the Manage enzymes button This will display the dialog shown in figure 13 6 3 E G Manage enzymes 1 Please choose enzymes A As us Enzyme list Use existing enzyme list Popular enzymes y io Enzymes in Popular en Enzymes shown in Side Panel Filter Filter Name Overhang Methylation Popula Name Overhang Methylation Popula BamHI 5 gatc N4 methy tt a EcoRI 5 aatt N6 methy BglII 5 gatc N4 methy tt E Smal Blunt N4 methy eeer EcoRI 5 aatt N6 methy Sall 5 tcga N6 methy EcoRV Blunt N6 methy gt
7. Note that if you move data between locations the original data is kept This means that you are essentially doing a copy instead of a move operation Move using drag and drop Using drag and drop in the Navigation Area as well as in general is a four step process click the element click on the element again and hold left mouse button drag the element to the desired location let go of mouse button This allows you to e Move elements between different folders in the Navigation Area e Drag from the Navigation Area to the View Area A new view is opened in an existing View Area if the element is dragged from the Navigation Area and dropped next to the tab s in that View Area e Drag from the View Area to the Navigation Area The element e g a sequence alignment search report etc is saved where it is dropped If the element already exists you are asked whether you want to save a copy You drag from the View Area by dragging the tab of the desired element Use of drag and drop is supported throughout the program also to open and re arrange views see section 3 2 6 Note that if you move data between locations the original data is kept This means that you are essentially doing a copy instead of a move operation CHAPTER 3 USER INTERFACE 41 Copy using drag and drop To copy instead of move using drag and drop hold the Ctrl on Mac key while dragging click the element click on the element again and hold left
8. This chapter provides an overview of the different areas in the user interface of CLC Sequence Viewer As can be seen from figure 3 1 this includes a Navigation Area View Area Menu Bar Toolbar Status Bar and Toolbox f CLC Free Workbench 4 0 Current workspace Default SEE File Edit Search view Toolbox Workspace Help s 2 Elst e vie nd ieee Ed Es TK EN IN v A 2 Show New Import Export Graphics Print Copy Workspace Search Fit Width 10096 Selection Zoom In Zoom Menu Bar IS vIgaci n e Ac AY738615 Toolbar WAS H Restriction analysis r e ES i i e B E Sequences Navigation Ar Ae 20 PERH2BD Sequence layout 200 HUMHBB Spacing No spacing 29 NM 000044 E sequence list O No wrap View Area 306 PERH3BC 206 HUMDINUC Auto wrap Protein O Fixed yap O Extra AY738615 CCTTTAGTGATGGCCTGGCTCACCTGGA E README y 10000 Recycle bin 0 v To ol b Ox lt C Double stranded oolbo Numbers on sequences AO Alignments and Trees Relative to a g akal Sequence Analyses del KA Nucleotide Analyses Jesse on phs arand EE EA Restriction Sites Follow selection E B Database Search v Processes Toolbox fe O S O E Idle 1 elementis ape selected Status Bar Figure 3 1 The user interface consists of the Menu Bar Toolbar Status Bar Navigation Area Toolbox and View Area 3 1 Navigation Area The Navigation Area is located in the left side of the screen under the Toolbar
9. and Paste 7 in the Toolbar e Using drag and drop to move elements e Using drag and drop while pressing Ctrl Command to copy elements In the following all of these possibilities for moving and copying elements are described in further detail CHAPTER 3 USER INTERFACE 40 Copy cut and paste functions Copies of elements and folders can be made with the copy paste function which can be applied in a number of ways select the files to copy right click one of the selected files Copy 15 right click the location to insert files into Paste C or select the files to copy Ctrl C 38 C on Mac select where to insert files Ctrl P 36 Pon Mac or select the files to copy Edit in the Menu Bar Copy 755 select where to insert files Edit in the Menu Bar Paste 71 If there is already an element of that name the pasted element will be renamed by appending a number at the end of the name Elements can also be moved instead of copied This is done with the cut paste function select the files to cut right click one of the selected files Cut ei right click the location to insert files into Paste C gt or select the files to cut Ctrl X 38 X on Mac select where to insert files Ctrl V 38 V on Mac When you have cut the element it is greyed out until you activate the paste function If you change your mind you can revert the cut command by copying another element
10. e Double stranded Shows both strands of a sequence only applies to DNA sequences e Numbers on sequences Shows residue positions along the sequence The starting point can be changed by setting the number in the field below If you set it to e g 101 the first residue will have the position of 100 This can also be done by right clicking an annotation and choosing Set Numbers Relative to This Annotation e Numbers on plus strand Whether to set the numbers relative to the positive or the negative strand in a nucleotide sequence only applies to DNA sequences e Follow selection When viewing the same sequence in two separate views Follow selection will automatically scroll the view in order to follow a selection made in the other view e Lock numbers When you scroll vertically the position numbers remain visible Only possible when the sequence is not wrapped e Lock labels When you scroll horizontally the label of the sequence remains visible e Sequence label Defines the label to the left of the sequence Name this is the default information to be shown Accession Sequences downloaded from databases like GenBank have an accession number Latin name Latin name accession Common name Common name accession Annotation Layout and Annotation Types See section 9 3 1 Restriction sites See section 9 1 2 Motifs See section CHAPTER 9 VIEWING AND EDITING SEQUENCES 93 R
11. e Minimum Length Specifies the minimum length for the ORFs to be found The length is specified as number of codons Using open reading frames for gene finding is a fairly simple approach which is likely to predict genes which are not real Setting a relatively high minimum length of the ORFs will reduce the number of false positive predictions but at the same time short genes may be missed see figure 12 8 th 0 Ne NC 000913 selection ORF 8000 ORF aax ORF on e mp E ORF yaal Figure 12 8 The first 12 000 positions of the E coli sequence NC 000913 downloaded from GenBank The blue dark annotations are the genes while the yellow brighter annotations are the ORFs with a length of at least 100 amino acids On the positive strand around position 11 000 a gene starts before the ORF This is due to the use of the standard genetic code rather than the bacterial code This particular gene starts with CTG which is a start codon in bacteria Two short genes are entirely missing while a handful of open reading frames do not correspond to any of the annotated genes NC 000913 selection NC 000913 selection Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish Finding open reading frames is often a good first step in annotating sequences such as cloning vectors or bacterial genomes For eukaryotic genes ORF determination may not always be very helpful since the int
12. 4361 bp Y we O E OB El 11 9 amp AGP pBR322 O 2 40 A O eee tet et 60 80 l l pBR322 AGTTTATCACAGTTAAATTGCTAACGCAGTCAGGCACCGTGTA 3 ke O ES Uh El 11 9 Ep Figure 9 5 Two views showing the same sequence The bottom view is zoomed in Note If you make a selection in one of the views the other view will also make the corresponding selection providing an easy way for you to focus on the same region in both views 9 2 2 Mark molecule as circular and specify starting point You can mark a DNA molecule as circular by right clicking its name in either the sequence view or the circular view In the right click menu you can also make a circular molecule linear A circular molecule displayed in the normal sequence view will have the sequence ends marked with a The starting point of a circular sequence can be changed by make a selection starting at the position that you want to be the new starting point right click the selection Move Starting Point to Selection Start Note This can only be done for sequence that have been marked as circular 9 3 Working with annotations Annotations provide information about specific regions of a sequence A typical example is the annotation of a gene on a genomic DNA sequence Annotations derive from different sources e Sequences downloaded from databases like GenBank are annotated CHAPTER 9 VIEWING AND EDITING SEQUENCES 99 e In some of the data formats th
13. CLC Sequence Viewer User manual Manual for CLC Sequence Viewer 6 7 Windows Mac OS X and Linux August 8 2012 This software is for research purposes only CLC bio Finlandsgade 10 12 DK 8200 Aarhus N gt Denmark o LU bio Contents Introduction 1 Introduction to CLC Sequence Viewer EE CON CML os a se sisi osas nerds coa 1 2 DONO and installation 2 s s 2 s a e we Oo oe a DS we a 1 3 System requirements noanoa 45 8k Y RES eR Ewe ERD EER ERED Ds 1 4 About CLC Workbenches 4 mo 1 5 When the program is installed Getting started LO SPT TT E AAA 1 7 Network configuration 1 8 The format of the user Manel so ica a a a E dm ew 4 2 Tutorials 2 1 Tutorial Getting started a aono oaoa e ee 2 2 Tutorial View sequence a a 4 2 3 Tutorial Side Panel Settings 646 4s ead ee aaa Re EE a 2 4 Tutorial GenBank search and download 0 00 eee eee 2 5 Tutorial Align protein Sequences aa 2 6 Tutorial Create and modify a phylogenetic tree 02 2208 2 7 Tutornal Mnd restriction SIES circos nde gde ede he Ea Il Core Functionalities 3 User interface Do OCO nO os ek Gee eee eee eke Peete eee eee eee es 3 2 View Area concisa a a a EE ARENA ee ES CONTENTS 9 3 3 Zoom and selection in View Area lt cesar RE ERA E E 3 4 Toolbox and Status Bar aoao e chom 102 2 ENET TE EERTE ras
14. EB Restriction Site Analysis ES 1 Select DNA RNA Number of cut sites sequence s 2 Enzymes to be considered in calculation 3 Number of cut sites Display enzymes with F No restriction site 0 One restriction site 1 E Two restriction sites 2 Three restriction sites 3 N restriction sites Any number of restriction sites gt 0 CSA Previous gt Next Cancel Figure 2 19 Selecting output for restriction map analysis Click Finish to start the restriction map analysis CHAPTER 2 TUTORIALS 34 View restriction site The restriction sites are shown in two views one view is in a tabular format and the other view displays the sites as annotations on the sequence The result is shown in figure 2 20 act ATPBal MENA C Rows 2 Restriction sites table Filter Pattern Overhang Number of c Cut position s ggtace 3 1 1208 ecgcgg 3 1 119 Figure 2 20 The result of the restriction map analysis is displayed in a table at the bottom and as annotations on the sequence in the view at the top Part Il Core Functionalities 35 Chapter 3 User interface Contents Sb Navan Aled scales 37 Sulit DGM sarna ba ee ee A e a 38 3 1 2 Create new folders ee 39 3 1 3 Sorting folders iw ss e a Boeke we ew S amp S 39 3 1 4 Multiselecting elements 0 0 0 02 ee a ee ee 39 3 1 5 Moving and copying elementS a 39 3 1 6
15. Enzymes to be considered in ca COLI sequence s Enzyme list be considered Use existing enzyme list Popular enzymes v 2 Enzymes to be in calculation Enzymes in Popular en Enzymes to be used Filter E Filter Name Overhang Methylat Popul Name Overhang Methyla Pop 5 N6 met 5 N6 met 5 5 meth Peer 5 S meth 5 N4 met 3 N met 5 S meth 3 3 3 3 c 3 3 3 5 3 c 3 3 5 S meth Figure 13 17 Selecting enzymes If you need more detailed information and filtering of the enzymes either place your mouse cursor on an enzyme for one second to display additional information see figure 13 18 or use the view of enzyme lists See 13 3 Click Finish to open the enzyme list 13 3 2 View and modify enzyme list An enzyme list is shown in figure 13 19 The list can be sorted by clicking the columns The CLC Sequence Viewer comes with a standard set of enzymes based on http www rebase neb com You can customize the enzyme database for your installation see section CHAPTER 13 RESTRICTION SITE ANALYSES 142 All enzymes Filter 3 Name Overh Methyl Pop PstI 3 N6 meth eee lA KpnI 3 N6 meth pee Sacl 3 S methyl peer SphI 3 sek Apal 3 S methyl potes Sacll 3 5 methyl pet NsiI Enzyme Sacll Chal Recognition site pattern CCGCGG
16. F4 Ctrl E Ctrl G Space or F1 Ctrl Ctrl M Ctrl arrow keys arrow keys Ctrl Shift N Ctrl N Ctrl O Ctrl V Ctrl P Ctrl Y F2 Ctrl R Ctrl S Ctrl F Ctrl Shift F Ctrl B Ctrl Shift U Ctrl A Ctrl 2 Ctrl U Ctrl Shift R Ctrl T Ctrl J Ctrl Shift T Ctrl Z Ctrl K Ctrl plus plus Ctrl minus minus press and hold Shift 99 Mac OS X Shift arrow keys Ctrl Page Up Down ao W a6 Shift W C a X Delete or Backspace db Q ao E a G Space or F1 ab M db arrow keys arrow keys Shift N N 0 MV P Y R S F Shift F B Shift U 2 U Shift R T J Shift T Z 3 plus d 4 76 36 36 36 36 36 36 36 38 36 38 36 38 36 36 36 1 38 36 36 36 36 36 minus press and hold Shift Combinations of keys and mouse movements are listed below tOn Linux changing tabs is accomplished using Ctrl Page Up Page Down CHAPTER 3 USER INTERFACE 96 Action Windows Linux Mac OS X Mouse movement Maximize View Double click the tab of the View Restore View Double click the View title EL Reverse zoom function Shift Shift Click in view Select multiple elements Ctrl ab Click elements Select multiple elements Shift Shift Click elements ements in this context refers to elements and folders in the Navigation Area se
17. No matter whether you have chosen to print the visible area or the whole view you can adjust page setup of the print An example of this can be seen in figure 5 5 EB Page Setup O Portrait Landscape Paper Size A4 X Fit to pages Horizontal pages Vertical pages Y ok X Cancel Help Figure 5 5 Page Setup In this dialog you can adjust both the setup of the pages and specify a header and a footer by clicking the tab at the top of the dialog You can modify the layout of the page using the following options e Orientation Portrait Will print with the paper oriented vertically Landscape Will print with the paper oriented horizontally e Paper size Adjust the size to match the paper in your printer e Fit to pages Can be used to control how the graphics should be split across pages see figure 5 6 for an example Horizontal pages If you set the value to e g 2 the printed content will be broken up horizontally and split across 2 pages This is useful for Sequences that are not wrapped Vertical pages If you set the value to e g 2 the printed content will be broken up vertically and split across 2 pages Note It is a good idea to consider adjusting view settings e g Wrap for sequences in the Side Panel before printing As explained in the beginning of this chapter the printed material will look like the view on the screen and therefore these settings should also be con
18. Popular enzymes v Enzymes to be considered in calculation Enzymes in Popular en Filter 137 Enzymes to be used El Filter Name PstI KpnI SacI SphI Apal Ball Chal FokI Hhal Nsil Sacll Overhang La 02 0 CM 0 0 0 0 Co dd w tgca gtac agct catg ggcc nnn gatc lt N gt cg tgca gc Methylat 5 N6 met 5 N met 5 S meth 5 5 meth 5 N4 met 3 N6 met 5 5 meth 5 5 meth Popul Name Overhang Methyla Pop set ao ae KEKE Mete ok a ot ook Meteo Figure 13 10 Selecting enzymes If you need more detailed information and filtering of the enzymes either place your mouse cursor on an enzyme for one second to display additional information see figure 13 18 or use the view of enzyme lists see 13 3 All enzymes Filter 3 Name Overh Methyl Pop PstI 3 N6 meth ereto A KpnI 3 N6 meth poe Sacl 3 5 methyl ee SphI E HEE Apal E S methyl et Sacll Nsil Chal Recognition site pattern CCGCGG Ball Suppliers GE Healthcare Hhal Qbiogene Mem American Allied Biochemical Inc Dralll Nippon Gene Co Ltd Takara Bio Inc BanII New England Biolabs Toyobo Biochemicals Molecular Biology Resources Promega Corporation EURx Ltd Figure 13 11 Showing additional information about an enzyme like re
19. Uninstall Figure 1 3 The plug in manager with plug ins installed If you do not wish to completely uninstall the plug in but you don t want it to be used next time you start the Workbench click the Disable button When you close the dialog you will be asked whether you wish to restart the workbench The plug in will not be uninstalled before the workbench is restarted 1 6 3 Updating plug ins If a new version of a plug in is available you will get a notification during start up as shown in figure 1 4 In this list select which plug ins you wish to update and click Install Updates If you press Cancel you will be able to install the plug ins later by clicking Check for Updates in the Plug in manager see figure 1 3 1 6 4 Resources Resources are downloaded installed un installed and updated the same way as plug ins Click the Download Resources tab at the top of the plug in manager and you will see a list of available resources see figure 1 5 Currently the only resources available are PFAM databases for use with CLC Genomics Workbench and CLC Main Workbench Because procedures for downloading installation uninstallation and updating are the same as for plug ins see section 1 6 1 and section 1 6 2 for more information CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 18 Updates available Ty CLC Plugins Updates are available for your plug ins and or resources Use the list below to select whic
20. 0 Obits Es 83 OD y Sequence layout Spacing bm Every 10 residues No wrap 8 Auto wrap O Fixed wrap 7 Numbers on sequences Relative to 1 k Hide labels lv Lock labels Sequence label Name x C Show selection boxes C Identical residues as dots Annotation layout Annotation types Residue coloring gt Nonstandard residues X Figure 2 15 The resulting alignment algorithms for making trees You also have the option of including a bootstrap analysis of the result Leave the parameters at their default and click Finish to start the calculation which can be seen in the Toolbox under the Processes tab After a short while a tree appears in the View Area figure 2 16 P68873 P68225 P68228 P68945 P68063 E r pa Tree Layout oe Mode symbol Dot Layout Standard Show internal mode labels E Label color ER Branch label color Mode color Line color Annotation Layout Branches Bootstrap Text Format Figure 2 16 After choosing which algorithm should be used the tree appears in the View Area The Side panel in the right side of the view allows you to adjust the way the tree is displayed 2 6 1 Tree layout Using the Side Panel in the right side of the view you can change the way the tree is displayed Click Tree Layout and open the Layout drop do
21. 6646686444 684 4 E EE E E E 12 G2 DMD amp oa be ee A ee e a 15 6 3 Export graphics to files lt lt lt 0 17 6 3 1 Which part of the view to export lt 17 6 3 2 Save location and file formats ee eee 19 6 3 3 Graphics export paraMeterS le 80 6 3 4 Exporting protein reports 0 81 6 4 Export graph data points to a file 1 ee ee ee lt lt 4 81 6 5 Copy paste view output 2 2 eee 83 CLC Sequence Viewer handles a large number of different data formats In order to work with data in the Workbench It has to be imported ES Data types that are not recognized by the Workbench are imported as external files which means that when you open these they will open in the default application for that file type on your computer e g Word documents will open in Word This chapter first deals with importing and exporting data in bioinformatic data formats and as external files Next comes an explanation of how to export graph data points to a file and how export graphics 6 1 Standard import CLC Sequence Viewer has support for a wide range of bioinformatic data such as sequences alignments etc See a full list of the data formats in section D 1 These data can be imported through the Import dialog using drag drop or copy paste as explained below 10 CHAPTER 6 IMPO
22. B M ller M B and Wibling G 2000 Statistical alignment computational properties homology testing and goodness of fit J Mol Biol 302 1 265 279 Ikai 1980 Ikai A 1980 Thermostability and aliphatic index of globular proteins J Biochem Tokyo 88 6 1895 1898 Jukes and Cantor 1969 Jukes T and Cantor C 1969 Mammalian Protein Metabolism chapter Evolution of protein molecules pages 21 32 New York Academic Press 178 BIBLIOGRAPHY 1 9 Knudsen and Miyamoto 2001 Knudsen B and Miyamoto M M 2001 A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins Proc Natl Acad Sci USA 98 25 14512 14517 Larget and Simon 1999 Larget B and Simon D 1999 Markov chain monte carlo algorithms for the bayesian analysis of phylogenetic trees Mol Biol Evol 16 750 759 Leitner and Albert 1999 Leitner T and Albert J 1999 The molecular clock of HIV 1 unveiled through analysis of a known transmission history Proc Natl Acad Sci USA 96 19 10 752 10757 Michener and Sokal 1957 Michener C and Sokal R 1957 A quantitative approach to a problem in classification Evolution 11 130 162 Purvis 1995 Purvis A 1995 A composite estimate of primate phylogeny Philos Trans R Soc Lond B Biol Sci 348 1326 405 421 Saitou and Nei 1987 Saitou N and Nei M 1987 The neighbor joining method a new method for reconstructing phylogenetic trees Mol
23. Constructing a multiple alignment corresponds to developing a hypothesis of how a number of sequences have evolved through the processes of character substitution insertion and deletion The input to multiple alignment algorithms is a number of homologous sequences i e sequences that share a common ancestor and most often also share molecular function The generated alignment is a table see figure 14 6 where each row corresponds to an input sequence and each column corresponds to a position in the alignment An individual column in this table represents residues that have all diverged from a common ancestral residue Gaps in the table commonly represented by a represent positions where residues have been inserted or deleted and thus do not have ancestral counterparts in all sequences 14 4 1 Use of multiple alignments Once a multiple alignment is constructed it can form the basis for a number of analyses e The phylogenetic relationship of the sequences can be investigated by tree building methods based on the alignment e Annotation of functional domains which may only be known for a subset of the sequences can be transferred to aligned positions in other un annotated sequences e Conserved regions in the alignment can be found which are prime candidates for holding functionally important sites e Comparative bioinformatical analysis can be performed to identify functionally important regions 14 4 2 Constructing multiple alignm
24. FokI lt NA gt 3 N6 met cg 5 S meth tgca Hk gc 5 S meth a Hhal Nsil SacllI La 02 0 CM 0 0 0 0 Co dd w Figure 13 7 Selecting enzymes If you need more detailed information and filtering of the enzymes either place your mouse cursor on an enzyme for one second to display additional information see figure 13 18 or use the view of enzyme lists see 13 3 All enzymes Filter 3 Name Overh Methyl Pop PstI 3 N6 meth eik a KpnI 3 N meth rete SacI 3 S methyl peH sphi 3 eek Apal 3 S methyl Poe Sacl 3 S methyl pee NsiI Enzyme SaclI Recognition site pattern CCGCGG Suppliers GE Healthcare Hhal Qbiogene Mem American Allied Biochemical Inc Dr alll Nippon Gene Co Ltd Takara Bio Inc New England Biolabs Toyobo Biochemicals Molecular Biology Resources Promega Corporation EURx Ltd Figure 13 8 Showing additional information about an enzyme like recognition sequence or a list of commercial vendors At the bottom of the dialog you can select to save this list of enzymes as a new file In this way you can save the selection of enzymes for later use When you click Finish the enzymes are added to the Side Panel and the cut sites are shown on the sequence If you have specified a set of enzymes which you always use it will probably be a good idea to save the settings in the S
25. PstI 3 taca N6 methy HindIII 5 agct N6 methy 4 XhoI 5 tcga N6 methy PstI 3 taca N6 methy EcoRV Blunt N6 methy Sall 5 tcga N6 methy BglII 5 gatc N4 methy Smal Blunt N4 methy Xbal ctag N6 methy Xbal 5 ctag N6 methy HindIII XhoI 5 tcga N6 methy eee BamHI Clal S ra N amp methy te aget N6 methy gate N4 methy eee ana Cc Save Save as new enzyme list mE ses Stet fa ena Figure 13 6 Adding or removing enzymes from the Side Panel At the top you can choose to Use existing enzyme list Clicking this option lets you select an enzyme list which is stored in the Navigation Area See section 13 3 for more about creating and modifying enzyme lists Below there are two panels e Tothe left you see all the enzymes that are in the list select above If you have not chosen to use an existing enzyme list this panel shows all the enzymes available t e To the right there is a list of the enzymes that will be used Select enzymes in the left side panel and add them to the right panel by double clicking or clicking the Add button E gt If you e g wish to use EcoRV and BamHI select these two enzymes and add them to the right side panel If you wish to use all the enzymes in the list Click in the panel to the left press Ctrl A 38 A on Mac Add gt The enzymes can be
26. Restriction Sites 3 Restriction Site Analysis ck Click Next to set parameters for the restriction map analysis In this step first select Use existing enzyme list and click the Browse for enzyme list button acy Select the Popular enzymes in the Cloning folder under Enzyme lists Then write 3 into the filter below to the left Select all the enzymes and click the Add button The result should be like in figure 2 18 Restriction Site Analysis 1 Select DNA RN A E MeyMes LO DE considered im calcuiaciom sequence s Enzyme list 2 Enzymes to be considered Use existing enzyme list Popular enzymes v O in calculation A Enzymes in Popular en Enzymes to be used Filter 3 Filter Name Overhang Methylat Popul Name Overhang Methyla Pop PstI 3 tgca 5 N6 met KpnI 3 gtac 5 N6 met Sacl 3 agct 5 S meth Pee SphI 3 catg Apal 3 ggcc 5 S meth Ball 3 nnn 5 N4 met Chal 3 gate etek FokI 5 lt NA gt 3 N met Hhal 3 cg 5 S meth Nsil 3 tgca Sacll 3 gc 5 S meth Figure 2 18 Selecting enzymes Click Next In this step you specify that you want to show enzymes that cut the sequence only once This means that you should de select the Two restriction sites checkbox Click Next and select that you want to Add restriction sites as annotations on sequence and Create restriction map See figure 2 19
27. Ser MIMO eee eee ee ee ya a ee ee ees User preferences and settings 4 1 General preferences aaa 4 2 Default view preferences ara 4 3 Advanced preferences sra 4 4 Export import of preferences 4 5 View settings for the Side Panel lt lt lt 0 Printing 5 1 Selecting which part of the view to print mee RABO SCI uo oa e eos a a a e 5 3 Print preview ocaso ww RRS ED E SS Eee He GS Import export of data and graphics 6 1 Standardimport 4 02 DIAM aa AI a AAA eee Bee A 6 3 Export graphics TO files eek ew ee ee OE Dis dd dd EE DA 6 4 Export graph data pointstoafile 6 5 Copy paste view output oaoa oaoa oa a a a e a History log GL EOMEn IGON ccoo RLL AA Batching and result handling 8 1 How to handle results of analyses aoao oaoa a a Bioinformatics Viewing and editing sequences 9 1 VIEW sequence o 0 ca er a a a a E Se RA 9 2 Circular DNA 426 ke he Breen ww bE a 90 541 53 54 57 of 98 61 61 62 66 67 68 69 70 10 15 tf 81 83 84 84 86 86 89 CONTENTS 5 9 3 Working with annotations aoao oaoa oaoa a a a e a 98 9 4 Element information eos babe bh eS ERR RE GE Re 103 Do VENGERA ico dea a a a daw ibaa 104 9 6 Creating a new Sequence
28. The final step concerns the handling of the results of the analysis and it is almost identical for all the analyses so we explain it in this section in general E Convert DNA to RNA Do 1 Select DNA sequences as 2 Result handling Result handling Open Save A Previous Next wf Einish X Cancel Figure 8 1 The last step of the analyses exemplified by Translate DNA to RNA In this step shown in figure 8 1 you have two options 86 CHAPTER 8 BATCHING AND RESULT HANDLING 87 e Open This will open the result of the analysis in a view This is the default setting e Save This means that the result will not be opened but saved to a folder in the Navigation Area If you select this option click Next and you will see one more step where you can specify where to save the results see figure 8 2 In this step you also have the option of creating a new folder or adding a location by clicking the buttons 15 15 at the top of f BB Convert DNA to RNA EJ 1 Select DNA sequences OVO EEE 2 Result handlin g SS 3 Save in folder Folder Update All CLC Data Example Data 2 ATP8al genomic sequence xx Sw ATP8al Cloning Primers Protein analyses Protein orthologs RNA secondary structure Sequencing data Qy lt enter search term gt A Figure 8 2 Specify a folder for the results of the analysis 8 1 1 Table outputs Some an
29. Transmembrane helix prediction TMHMM Secondary protein structure prediction PFAM domain search Main Main Main Genomics E Genomics E E Genomics E 163 APPENDIX A MORE FEATURES Viewer Main Genomics 164 Sequence alignment Multiple sequence alignments Two algo rithms Advanced re alignment and fix point align ment options Advanced alignment editing options Join multiple alignments into one Consensus sequence determination and management Conservation score along sequences Sequence logo graphs along alignments Gap fraction graphs Copy annotations between sequences in alignments Pairwise comparison Viewer Main Genomics RNA secondary structure Advanced prediction of RNA secondary struc ture Integrated use of base pairing constraints Graphical view and editing of secondary struc ture Info about energy contributions of structure elements Prediction of multiple sub optimal structures Evaluate structure hypothesis Structure scanning Partition function Dot plots Viewer Main Genomics Dot plot based analyses Phylogenetic trees Viewer Main Genomics Neighbor joining and UPGMA phylogenies Maximum likelihood phylogeny of nucleotides Pattern discovery Viewer Main Genomics Search for sequence match Motif search for basic patterns Motif search with regular expressions Motif search with ProSite patterns Pattern discovery APPENDIX A MORE
30. amp Contents of Sequences Filter Tyl Name Description Length AY738615 Homo sapiens hemoglobin delta beta fusion protein HBD HBB gene 180 THUMDIMUC Human dinucleotide repeat polymorphism at the D115439 and HBB loci 190 HUMHBB Human beta globin region on chromosome 73 NM 000044 Homo sapiens androgen receptor dihydrotestosterone receptor testi 4314 IPERH2BD P maniculatus deer mouse beta 2 globin Hbb b2 DNA 3 region 194 PERH3BC P maniculatus deer mouse beta 3 globin Hbb b3 DNA 3 region 196 sequence list 0 DEERE Figure 6 16 Selected elements in a Folder Content view When the elements are selected do the following to copy the selected elements right click one of the selected elements Edit Copy Then right click in the cell AZ Paste 4 The outcome might appear unorganized but with a few operations the structure of the view in CLC Sequence Viewer can be produced Except the icons which are replaced by file references in Excel Note that all tables can also be Exported ES directly in Excel format Chapter 7 History log Contents fa Element history 6 4664 c he eee cea AAA AA 84 7 1 1 Sharing data with history 2 6 be ie a eee we we a O E 85 CLC Sequence Viewer keeps a log of all operations you make in the program If e g you rename a sequence align sequences create a phylogenetic tree or translate a sequence you can always go back and check
31. net P68046 O agh P6s0s3 0 act Pesos at PE je pio View HBB Toolbox k show A PF68225 MVHLTPEEKNAVTTLWG OM close Chel IY PWT Y Close Tab Area HBB Close All Views Ctrl Shift W TE Close Other Tabs P68225 ESFGDLSSPDAVMGNPK ILDNL Save As Ctrl Shift 5 Figure 3 8 By right clicking a tab several close options are available 3 2 4 Save changes in a view When changes are made in a view the text on the tab appears bold and italic on Mac it is indicated by an before the name of the tab This indicates that the changes are not saved The Save function may be activated in two ways Click the tab of the view you want to save Save HD in the toolbar or Click the tab of the view you want to save Ctrl S 38 S on Mac If you close a view containing an element that has been changed since you opened it you are asked if you want to save When saving a new view that has not been opened from the Navigation Area e g when opening a sequence from a list of search hits a save dialog appears figure 3 9 DS Folder Update All 5 CLC Data Example Data 2 ATP8al genomic sequence 25 ATP8al mRNA Sw ATP8al 4 Cloning H Primers Protein analyses Protein orthologs 45 alignment 1 Ez ATP8al ortholog tree su Su P39524 SN PS7792 fas Q29449 Sh QONTIZ NR OSA Q lt enter search term gt gt gt 4 du Ji Xen 7 Hee Figure 3 9 Save dia
32. see figure 3 2 It is used for organizing and navigating data Its behavior is similar to the way files and folders are usually displayed on your computer p ta CLC Data E gt Example Data i a Cloning vectors FE Extra aa Nucleotide GF Protein GEG RNA E im e Qs lt erterserehterm gt JA Figure 3 2 The Navigation Area CHAPTER 3 USER INTERFACE 38 3 1 1 Data structure The data in the Navigation Area is organized into a number of Locations When the CLC Sequence Viewer is started for the first time there is one location called CLC Data unless your computer administrator has configured the installation otherwise A location represents a folder on the computer The data shown under a location in the Navigation Area is stored on the computer in the folder which the location points to This is explained visually in figure 3 3 CLC Data File Edit View Favorites Tools Help ay Back S ya Search A A A O 2 2 2 2 SY y Ma TETAS Y z e CLC Data Example data ta Extra Nucleotide ES Assembly ES Cloning ES More data l Primer design ES Restriction analysis EJ Sequences Protein ES 3D structures Address C Documents and Settings clcuserfCLC_Data Y Go Folders x O CLC Data A 3 5 Example data D E 5 Extra 5 Nucleotide O Assembly O Cloning O More data 5 Primer design O Restriction analysis O Sequences E O Protein O 3D structures Eb Eb Eb Er
33. 1 02 Description Perform alignments with many different programs from within the workbench ClustalW Windows Mac Linux Muscle Windows Mac Linux T Coffee Mac Linux Download and install e MAFFT Mac Linux Kalign Mac Linux Extract Annotations g version 1 02 Extracts annotations from one or more sequences The result is a More information is available on the sequence list containing sequences covered by the specified Additional alignments plugin website annotations Additional information E Usage g o ntl pit ESE Located in Toolbox gt Alignments and Trees gt Additional Alignments Version 1 02 Using this plug in it is possible to annotate a sequence from list of annotations found in a GFF file Y Additional Alignments Located in the Toolbox nas f FEE Clustal Alignment SignalP EF Muscle Alignment G Version 1 02 EE Clustal Algnmant y Y Figure 1 2 The plug ins that are available for download A AS Clicking a plug in will display additional information at the right side of the dialog This will also display a button Download and Install Click the plug in and press Download and Install A dialog displaying progress is now shown and the plug in is downloaded and installed If the plug in is not shown on the server and you have it on your computer e g if you have downloaded it from our web site you can install it by clicking the Install from
34. 2 TUTORIALS 22 f CLC Free Workbench 4 0 Current workspace Default Sele File Edit Search View Toolbox Workspace Help 4 2 E pp dal Md od _ _ vi Selb ti Se pad E mM 3 i l Show New Export Workspace Search Selection Zoom In Zoom Out Ta SN Y S A CLC Data Example data Y EAN Quick start Ea x H ES Alignments and Trees o A General Sequence Analyses jp LOOKING FOR MORE FEATURES Database Search Primer Design Cloning BLAST 3D Molecule View Processes Toolbox Pattem Discovery View Chromatogram Traces Assembly Idle 1 element s are selected Figure 2 1 The user interface as it looks when you start the program for the first time Windows version of CLC Sequence Viewer The interface is similar for Mac and Linux At this stage the important issues are the Navigation Area and the View Area The Navigation Area to the left is where you keep all your data for use in the program Most analyses of CLC Sequence Viewer require that the data is saved in the Navigation Area There are several ways to get data into the Navigation Area and this tutorial describes how to import existing data The View Area is the main area to the right This is where the data can be viewed In general a View is a display of a piece of data and the View Area can include several Views The Views are represented by tabs and can be organized e g by using drag and drop 2 1 1 Creating a a f
35. 42 187 Terminated processes 52 Text format 94 user manual 20 view sequence 104 Text file format 1 3 tif format export 79 Toolbar illustration 37 preferences 59 Toolbox 51 52 illustration 37 show hide 52 Topology layout trees 155 Trace colors 93 Trace data 161 Translate annotation to protein 95 DNA to RNA 124 nucleotide sequence 12 RNA to DNA 125 to DNA 163 to protein 127 163 Transmembrane helix prediction 163 Trim 161 TSV file format 172 Tutorial Getting started 21 txt file format 1 3 UIPAC codes amino acids 1 5 Undo limit 57 Undo Redo 4 UniProt search 162 UPGMA algorithm 157 164 Urls Navigation Area 72 User defined view settings 59 User interface 3 Vector graphics export 9 VectorNTI file format 1 2 View 44 alignment 146 GenBank format 104 preferences 49 save changes 46 sequence 90 INDEX sequence as text 104 View Area 44 illustration 37 View preferences 58 show automatically 59 style sheet 62 View settings user defined 59 Virtual gel 165 vsf file format for settings 60 Web page import sequence from 2 Wildcard append to search 111 Windows installation 9 Workspace 53 create 53 delete 54 save 53 select 53 Wrap sequences 91 Xls file format 1 3 xIsx file format 173 xml file format 1 3 Zip file format 172 173 Zoom 50 tutorial 23 Zoom In 50 Zoom Out 50 Zoom to 100 51 188
36. BLAST Table CLC Standard Settings Non com pact ES Motif List editor Pa annotations ES Multi BLAST Table No restriction sites Aer Sequence Small RNA sample ES Table ES Table Te Tree Export Import Help XX Cancel Export Import Figure 4 4 Selecting the default view setting EN A ee Number of fraction digits 2 1 23 0 12 Examples 0 01 1 238 5 1 23E 4 1 23E 5 Figure 4 5 Number formatting of tables The examples below the text field are updated when you change the value so that you can see the effect After you have changed the preference you have to re open your tables to see the effect 4 2 2 Import and export Side Panel settings If you have created a special set of settings in the Side Panel that you wish to share with other CLC users you can export the settings in a file The other user can then import the settings To export the Side Panel settings first select the views that you wish to export settings for Use Ctri click 46 click on Mac or Shift click to select multiple views Next click the Export button Note that there is also another export button at the very bottom of the dialog but this will export the other settings of the Preferences dialog see section 4 4 A dialog will be shown see figure 4 6 that allows you to select which of the settings you wish to export When multiple views are selected for export all
37. Biol Evol 4 4 406 425 Siepel and Haussler 2004 Siepel A and Haussler D 2004 Combining phylogenetic and hidden Markov models in biosequence analysis J Comput Biol 11 2 3 413 428 Sneath and Sokal 1973 Sneath P and Sokal R 1973 Numerical Taxonomy Freeman San Francisco Tobias et al 1991 Tobias J W Shrader T E Rocap G and Varshavsky A 1991 The N end rule in bacteria Science 254 5036 13 74 1377 Yang and Rannala 1997 Yang Z and Rannala B 1997 Bayesian phylogenetic inference using DNA sequences a Markov Chain Monte Carlo Method Mol Biol Evol 14 7 717 24 Part V Index 180 Index 454 sequencing data 161 AB1 file format 1 2 Abbreviations amino acids 1 5 ABI file format 172 About CLC Workbenches 12 Accession number display 41 ace file format 1 3 Add annotations 162 Adjust selection 95 Advanced preferences 61 Algorithm alignment 143 neighbor joining 15 UPGMA 157 Align protein sequences tutorial 29 sequences 163 Alignment see Alignments Alignments 143 163 create 143 edit 148 fast algorithm 145 multiple Bioinformatics explained 150 view 146 view annotations on 99 Aliphatic index 119 aln file format 1 3 Alphabetical sorting of folders 39 Amino acid composition 121 Amino acids abbreviations 1 5 UIPAC codes 175 Annotation select 95 Annotation Layout in Side Panel 99 Annotation Types in Side Panel 9
38. CHAPTER 3 USER INTERFACE 99 The tools in the toolbox can be accessed by double clicking or by dragging elements from the Navigation Area to an item in the Toolbox 3 4 3 Status Bar As can be seen from figure 3 1 the Status Bar is located at the bottom of the window In the left side of the bar is an indication of whether the computer is making calculations or whether it is idle The right side of the Status Bar indicates the range of the selection of a sequence See chapter 3 3 6 for more about the Selection mode button 3 5 Workspace If you are working on a project and have arranged the views for this project you can save this arrangement using Workspaces A Workspace remembers the way you have arranged the views and you can switch between different workspaces The Navigation Area always contains the same data across Workspaces It is however possible to open different folders in the different Workspaces Consequently the program allows you to display different clusters of the data in separate Workspaces All Workspaces are automatically saved when closing down CLC Sequence Viewer The next time you run the program the Workspaces are reopened exactly as you left them Note It is not possible to run more than one version of CLC Sequence Viewer at a time Use two or more Workspaces instead 3 5 1 Create Workspace When working with large amounts of data it might be a good idea to split the work into two or more Workspaces A
39. FEATURES Primer design Viewer Advanced primer design tools Detailed primer and probe parameters Graphical display of primers Generation of primer design output Support for Standard PCR Support for Nested PCR Support for TaqMan PCR Support for Sequencing primers Alignment based primer design Alignment based TaqMan probe design Match primer with sequence Ordering of primers Advanced analysis of primer properties Molecular cloning Viewer Advanced molecular cloning Graphical display of in silico cloning Advanced sequence manipulation Virtual gel view Viewer Fully integrated virtual 1D DNA gel simulator Main Main y E EI Main yu Genomics E Genomics E E E Genomics E For a more detailed comparison we refer to http www clcbio com compare 165 Appendix B Graph preferences This section explains the view settings of graphs The Graph preferences at the top of the Side Panel includes the following settings e Lock axes This will always show the axes even though the plot is zoomed to a detailed level e Frame Shows a frame around the graph e Show legends Shows the data legends e Tick type Determine whether tick lines should be shown outside or inside the frame Outside Inside e Tick lines at Choosing Major ticks will show a grid behind the graph None Major ticks e Horizontal axis range Sets the range of the horizontal axis x axis Enter a value in Min and Max
40. Figure C 3 The advanced filter showing open reading frames larger than 400 that are placed on the negative strand Both for the simple and the advanced filter there is a counter at the upper left corner which tells you the number of rows that pass the filter 91 in figure C 2 and 15 in figure C 3 Appendix D Formats for import and export D 1 List of bioinformatic data formats Below is a list of bioinformatic data formats i e formats for importing and exporting sequences alignments and trees 1 1 APPENDIX D FORMATS FOR IMPORT AND EXPORT D 1 1 Sequence data formats File type FASTA AB1 ABI CLC Clone Manager CSV export CSV import DNAstrider DS Gene Embl GCG sequence GenBank Gene Construction Kit Lasergene Nexus Phred PIR NBRF Raw sequence SCF2 SCF3 Staden Swiss Prot Tab delimited text Vector NTI archives Vector NTI Database Zip export Zip import Suffix fsa fasta abt abi clc cmo CSV CSV str strider bsml embl geg 80k 80 8P gck pro seq NXS NeEXUS phd pir any SCf SCf sdn SWP txt 1 2 Import Export Description X X X XxX X gt lt X K K X XK X X X X X X X X X X ma4 pa4 0a4 X Zip Zip gzip tar X X X Simple format name amp description Including chromatograms Including chromatograms Rich format including all information Annotations in csv format One sequence per line name de
41. If one position is 100 conserved the graph will be shown in full height Learn how to export the data behind the graph in section 6 4 x Height Specifies the height of the graph x Type The type of the graph Line plot Displays the graph as a line plot Bar plot Displays the graph as a bar plot Colors Displays the graph as a color bar using a gradient like the foreground and background colors x Color box Specifies the color of the graph for line and bar plots and specifies a gradient for colors e Gap fraction Which fraction of the sequences in the alignment that have gaps The gap fraction is only relevant if there are gaps in the alignment Foreground color Colors the letter using a gradient where the left side color is used if there are relatively few gaps and the right side color is used if there are relatively many gaps CHAPTER 14 SEQUENCE ALIGNMENT 148 Background color Sets a background color of the residues using a gradient in the same way as described above Graph Displays the gap fraction as a graph at the bottom of the alignment Learn how to export the data behind the graph in section 6 4 x Height Specifies the height of the graph x Type The type of the graph Line plot Displays the graph as a line plot Bar plot Displays the graph as a line plot Colors Displays the graph as a color bar using a gradient like the foreground and background colors Color box Spec
42. Popularity Pyull cagctg Blunt GE Healthc N4 methylcytosine Yes Select All Create New Enzyme List from Selection Add Remove Enzymes Deselect All Figure 13 19 An enzyme list and you can use the filter at the top right corner to search for specific enzymes recognition sequences etc If you wish to remove or add enzymes click the Add Remove Enzymes button at the bottom of the view This will present the same dialog as shown in figure 13 16 with the enzyme list shown to the right If you wish to extract a subset of an enzyme list open the list select the relevant enzymes right click Create New Enzyme List from Selection Ez If you combined this method with the filter located at the top of the view you can extract a very specific set of enzymes E g if you wish to create a list of enzymes sold by a particular distributor type the name of the distributor into the filter and select and create a new enzyme list from the selection Chapter 14 Sequence alignment Contents 14 1 Create an alignment 2 143 IALL COMO oa oe ee rosadas sda Oe E 144 14 1 2 Fast or accurate alignment algorithm o 145 14 2 View alignmentS 2 146 14 3 Edit alignments lt 2 4 2 4 148 14 3 1 Move residues and gaps ee 148 14 3 2 Insert gapS a ac
43. Resampling methods 9 Mononucleotide shuffling Mononucleotide sampling from zero order Markov chain Dinucleotide shufflimg Dinucleotide sampling From first order Markov chain Number of sequences 10 GJA eens are nen Kove Figure 11 2 Parameters for shuffling e Dinucleotide shuffling Shuffle method generating a sequence of the exact same dinu cleotide frequency e Mononucleotide sampling from zero order Markov chain Resampling method generating a sequence of the same expected mononucleotide frequency e Dinucleotide sampling from first order Markov chain Resampling method generating a sequence of the same expected dinucleotide frequency For proteins the following parameters can be set e Single amino acid shuffling Shuffle method generating a sequence of the exact same amino acid frequency e Single amino acid sampling from zero order Markov chain Resampling method generating a sequence of the same expected single amino acid frequency e Dipeptide shuffling Shuffle method generating a sequence of the exact same dipeptide frequency CHAPTER 11 GENERAL SEQUENCE ANALYSES 116 e Dipeptide sampling from first order Markov chain Resampling method generating a sequence of the same expected dipeptide frequency For further details of these algorithms see Clote et al 2005 In addition to the shuffle method you can specify the number of randomized sequences to output Click Next if you wish
44. Toolbox o Pattem Discovery View Chromatogram Traces Assembly Idle 1 element s are selected Figure 3 16 An empty Workspace 3 5 3 Delete Workspace Deleting a Workspace can be done in the following way Workspace in the Menu Bar Delete Workspace choose which Workspace to delete OK Note Be careful to select the right Workspace when deleting The delete action cannot be undone However no data is lost because a workspace is only a representation of data It is not possible to delete the default workspace 3 6 List of shortcuts The keyboard shortcuts in CLC Sequence Viewer are listed below CHAPTER 3 USER INTERFACE Action Adjust selection Change between tabs Close Close all views Copy Cut Delete Exit Export Export graphics Find Next Conflict Find Previous Conflict Help Import Maximize restore size of View Move gaps in alignment Navigate sequence views New Folder New Sequence View Paste Print Redo Rename Reverse Complement Save Search local data Search within a sequence Search NCBI Search UniProt Select All Selection Mode Show hide Side Panel Sort folder Split Horizontally Split Vertically Translate to Protein Undo User Preferences Zoom In Mode Zoom In without clicking Zoom Out Mode Zoom Out without clicking Inverse zoom mode Windows Linux Shift arrow keys Ctrl tab Ctrl W Ctrl Shift W Ctrl C Ctrl X Delete Alt
45. actions can be undone Undo applies to all changes made on sequences alignments or trees See section 3 2 5 for more on this topic ol CHAPTER 4 USER PREFERENCES AND SETTINGS 98 EB Preferences Lea inde Suppo Undo limit 500 Rude Su ppor Enable audit of manual sequence modifications EN Search Number of hits ca Number of hits normal search 50 Number of hits NCBI Uniprot 50 a View cale Setting Style English United States Show all dialogs with Never show this dialog again Show Dialogs Help Y OK X Cancel Export Import Figure 4 1 Preferences include General preferences View preferences Colors preferences and Advanced settings e Audit Support If this option is checked all manual editing of sequences will be marked with an annotation on the sequence see figure 4 2 Placing the mouse on the annotation will reveal additional details about the change made to the sequence see figure 4 3 Note that no matter whether Audit Support is checked or not all changes are also recorded in the History LU see section 7 e Number of hits The number of hits shown in CLC Sequence Viewer when e g searching NCBI The sequences shown in the program are not downloaded until they are opened or dragged saved into the Navigation Area e Locale Setting Specify which country you are located in This determines how punctation is used in nu
46. algorithm Finally treating end gaps like any other gaps is the best option when you know that there are no biologically distinct effects at the ends of the sequences Figures 14 3 and 14 4 illustrate the differences between the different gap scores at the sequence ends 40 WRTEPERRSQ STRESMWHER sTRES UNHER s sE P MIHER STRIP P MBHER sTKPSWNHER sskPSWNHER STRIPS MMHBK 20 P49342 MNPTETKAM MSQ0QMECPHM PNEREHEROA P20810 EsoomEcPHE PNEEEHEROA P27321 i1 MSTTCAMA BRRESE PO8855 1MNPABABAMP BsREMEcPHP HSEREHEROS P12675 MNPTETRAMP MsKoBEcPHs P20811 095208 MNP THARA NP cSRQHECPHS PNKKRHKKOA MATH ERASO 1 1 20 40 P49342 MNMPTETMAM MSQOMBECPHM PNRIRIRIHEIRIOA WETEPERESO P20810 MNPTETKA MP WsoomEcrHE PNEREHEROA P27321 1 MSTTCAMAM PO8855 MNPABAMA Mr MsKBumBcrHr HSRMRHRROB P12675 UsKQBECPHS PNEERHEKROA P20811 MHENMAMAR Q95208 MNPTBAKAM CSKOBECPHS PNKKRIHKKOA MATE ERASO sTKPSMNHER Figure 14 3 The first 50 positions of two different alignments of seven calpastatin sequences The top alignment is made with cheap end gaps while the bottom alignment is made with end gaps having the same price as any other gaps In this case it seems that the latter scoring scheme gives the best result STKIESMMHBK STKIESMM HBK ssEPPMIEHER STR P MBHER 14 1 2 Fast or accurate alignment algorithm CLC Sequence Viewer has two a
47. all open reading frames ORF in a sequence or by choosing particular start codons to use it can be used as a rudimentary gene finder ORFs identified will be shown as annotations on the sequence You have the option of choosing a translation table the start codons to use minimum ORF length as well as a few other parameters These choices are explained in this section To find open reading frames select a nucleotide sequence Toolbox in the Menu Bar Nucleotide Analysis A Find Open Reading Frames xx or right click a nucleotide sequence Toolbox Nucleotide Analysis A Find Open Reading Frames This opens the dialog displayed in figure 12 6 If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements If you want to adjust the parameters for finding open reading frames click Next 12 5 1 Open reading frame parameters This opens the dialog displayed in figure 12 7 The adjustable parameters for the search are CHAPTER 12 NUCLEOTIDE ANALYSES 129 e Start codon a MN Find Open Reading Frames LES 1 Select nucleotide Projects Selected Elements 1 sequences CLC Data xx ATP8al genomic sequence HE Example Data xx XxX ATP8al mRNA E3 Cloning EE Primers Protein analyses Ej Protein orthologs ES RNA secondary stru
48. and press Enter This will update the view If you wait a few seconds without pressing Enter the view will also be updated e Vertical axis range Sets the range of the vertical axis y axis Enter a value in Min and Max and press Enter This will update the view If you wait a few seconds without pressing Enter the view will also be updated e X axis at zero This will draw the x axis at y O Note that the axis range will not be changed e Y axis at zero This will draw the y axis at x O Note that the axis range will not be changed e Show as histogram For some data series it is possible to see the graph as a histogram rather than a line plot 166 APPENDIX B GRAPH PREFERENCES 167 The Lines and plots below contains the following settings e Dot type None Cross Plus Square Diamond Circle Triangle Reverse triangle Dot Dot color Allows you to choose between many different colors Click the color box to select a color Line width Thin Medium Wide e Line type None Line Long dash Short dash e Line color Allows you to choose between many different colors Click the color box to select a color For graphs with multiple data series you can select which curve the dot and line preferences Should apply to This setting is at the top of the Side Panel group Note that the graph title and the axes titles can be edited simply by clicking with the mouse Th
49. annotations that are attached to the sequence s in the view For sequences with many annotations it can be easier to get an overview if you deselect the annotation types that are not relevant Unchecking the checkboxes in the Annotation Layout will not remove this type of annotations them from the sequence it will just hide them from the view Besides selecting which types of annotations that should be displayed the Annotation Types group is also used to change the color of the annotations on the sequence Click the colored square next to the relevant annotation type to change the color This will display a dialog with three tabs Swatches HSB and RGB They represent three different ways of specifying colors Apply your settings and click OK When you click OK the color settings cannot be reset The Reset function only works for changes made before pressing OK Furthermore the Annotation Types can be used to easily browse the annotations by clicking the small button next to the type This will display a list of the annotations of that type see figure 9 8 Clicking an annotation in the list will select this region on the sequence In this way you can quickly find a specific annotation on a long sequence View Annotations in a table Annotations can also be viewed in a table select the sequence in the Navigation Area Show 42 Annotation Table E or If the sequence is already open Click Show Annotation Table E at the l
50. are not directly observable Root node Branches edges Terminal nodes leaves Most recent common ancestor Operational Taxonomical Units Orangutan Human Pygmy chimpanzee Chimpanzee Gorilla Internal Node vertice Hypothetical Taxonomical Unit Figure 15 4 A proposed phylogeny of the great apes Hominidae Different components of the tree are marked see text for description The ordering of the nodes determine the tree topology and describes how lineages have diverged over the course of evolution The branches of the tree represent the amount of evolutionary divergence between two nodes in the tree and can be based on different measurements A tree is completely specified by its topology and the set of all edge lengths The phylogenetic tree in figure 15 4 is rooted at the most recent common ancestor of all Hominidae species and therefore represents a hypothesis of the direction of evolution e g that the common ancestor of gorilla chimpanzee and man existed before the common ancestor of chimpanzee and man In contrast an unrooted tree would represent relationships without assumptions about ancestry 15 2 2 Modern usage of phylogenies Besides evolutionary biology and systematics the inference of phylogenies is central to other areas of research AS more and more genetic diversity is being revealed through the completion of multiple genomes an active area of research within bioinformatics is the development of comparat
51. filter criterion you first have to select which column it should apply to Next you choose an operator For numbers you can choose between e equal to lt smaller than e gt greater than e lt gt not equal to e abs value lt absolute value smaller than This is useful if it doesn t matter whether the number is negative or positive e abs value gt absolute value greater than This is useful if it doesn t matter whether the number is negative or positive For text based columns you can choose between e contains the text does not have to be in the beginning e doesn t contain APPENDIX C WORKING WITH TABLES 170 e the whole text in the table cell has to match also lower upper case Once you have chosen an operator you can enter the text or numerical value to use If you wish to reset the filter simply remove E all the search criteria Note that the last one will not disappear It will be reset and allow you to start over Figure C 3 shows an example of an advanced filter which displays the open reading frames larger than 400 that are placed on the negative strand Find reading Rows 15 169 Find reading Frame output Filter Match any s Match all Length o rr Apply Start End Length Found ak strand Start codon 14 rate ovo negative ANT bd 3462 ao 426 negative CAC 414 556564 1851 negative CAC 24342 aboz 1663 negative ATA EA Cec STE mansabi TT
52. if the import goes wrong the next option can be helpful Force import as type This option should be used if CLC Sequence Viewer cannot successfully determine the file format By forcing the import as a specific type the automatic determination of the file format is bypassed and the file is imported as the type specified Force import as external file This option should be used if a file is imported as a bioinformatics file when it should just have been external file It could be an ordinary text file which is imported as a sequence CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 2 Import using drag and drop It is also possible to drag a file from e g the desktop into the Navigation Area of CLC Sequence Viewer This is equivalent to importing the file using the Automatic import option described above If the file type is not recognized it will be imported as an external file Import using copy paste of text If you have e g a text file or a browser displaying a sequence in one of the formats that can be imported by CLC Sequence Viewer there is a very easy way to get this sequence into the Navigation Area Copy the text from the text file or browser Select a folder in the Navigation Area Paste 71 This will create a new sequence based on the text copied This operation is equivalent to saving the text in a text file and importing it into the CLC Sequence Viewer If the sequence is not formatted i e if you just have a text l
53. in csv format 81 graphics history 76 list of formats 1 1 multiple files 5 preferences 01 Side Panel Settings 60 tables 1 3 Export visible area Export whole view Expression analysis 162 Extensions 15 External files import and export 72 Extinction coefficient 119 Extract sequences 107 FASTA file format 172 Feature request 13 Feature table 121 Filtering restriction enzymes 134 136 141 Find in GenBank file 104 in sequence 93 results from a finished process 52 Find open reading frames 128 Fit to pages print OS Fit Width 51 Floating Side Panel 64 Folder create new tutorial 22 Follow selection 91 Footer 69 Format of the manual 20 Fragment select 95 Free end gaps 145 fsa file format 1 3 G C content 163 Gap delete 149 extension cost 144 fraction 147 163 insert 148 open cost 144 183 Gb Division 103 gbk file format 1 3 GCG Alignment file format 173 GCG Sequence file format 1 2 gck file format 1 3 GCK Gene Construction Kit file format 1 2 Gel electrophoresis 105 GenBank view sequence in 104 file format 1 2 search 110 162 tutorial 28 Gene Construction Kit file format 172 Gene expression analysis 162 Gene finding 128 General preferences 5 General Sequence Analyses 114 Getting started tutorial 21 gff file format 1 3 Graph export data points in csv format 81 Graph Side Panel 166 Graphics data formats 1 4 export gzip
54. k Restriction sites Rasmol colors k Residue coloring Show translation Nucleotide info CLC Standard Settings k Find k Text Format Figure 4 12 Applying saved settings The settings are specific to the type of view Hence when you save settings of a circular view they will not be available if you open the sequence in a linear view If you wish to export the settings that you have saved this can be done in the Preferences dialog under the View tab see section 4 2 2 The remaining icons of figure 4 10 are used to Expand all groups Collapse all groups and Dock Undock Side Panel Dock Undock Side Panel is to make the Side Panel floating see below 4 5 1 Floating Side Panel The Side Panel of the views can be placed in the right side of a view or it can be floating see figure 4 13 By clicking the Dock icon 3 the floating Side Panel reappear in the right side of the view The size of the floating Side Panel can be adjusted by dragging the hatched area in the bottom right CHAPTER 4 USER PREFERENCES AND SETTINGS 65 Sequence list sequence list Number of rows 5 Accession Definition Modificati Length mM15292 Pymaniculat 27 APR 1993 110 EE Figure 4 13 The floating Side Panel can be moved out of the way e g to allow for a wider view of a table Chapter 5 Printing Contents 5 1 Selecting which part of the view to print 0 00 2 eee ee ee 67 Be PASE SOs rra AAA 68 5 2 1 Header
55. new folder is added at the bottom of this folder If an element is selected the new folder is added right above that element You can move the folder manually by selecting it and dragging it to the desired destination 3 1 3 Sorting folders You can sort the elements in a folder alphabetically right click the folder Sort Folder On Windows subfolders will be placed at the top of the folder and the rest of the elements will be listed below in alphabetical order On Mac both subfolders and other elements are listed together in alphabetical order 3 1 4 Multiselecting elements Multiselecting elements means that you select more than one element at the same time This can be done in the following ways e Holding down the lt Ctrl gt key on Mac while clicking on multiple elements selects the elements that have been clicked e Selecting one element and selecting another element while holding down the lt Shift gt key selects all the elements listed between the two locations the two end locations included e Selecting one element and moving the curser with the arrow keys while holding down the lt Shift gt key enables you to increase the number of elements selected 3 1 5 Moving and copying elements Elements can be moved and copied in several ways Tt from the Edit menu Using Copy h Cut and Paste Using Ctrl C 48 C on Mac Ctrl X X on Mac and Ctrl V V on Mac Using Copy 1 Cut
56. of plug ins As the range of plug ins is continuously updated and expanded they will not be listed here Instead we refer to http www clcbio com plug ins for a full list of plug ins with descriptions of their functionalities 1 6 1 Installing plug ins Plug ins are installed using the plug in manager Help in the Menu Bar Plug ins and Resources 15 or Plug ins in the Toolbar The plug in manager has four tabs at the top e Manage Plug ins This is an overview of plug ins that are installed e Download Plug ins This is an overview of available plug ins on CLC bio s server e Manage Resources This is an overview of resources that are installed tin order to install plug ins on Windows Vista the Workbench must be run in administrator mode Right click the program shortcut and choose Run as Administrator Then follow the procedure described below CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 16 e Download Resources This is an overview of available resources on CLC bio s server To install a plug in click the Download Plug ins tab This will display an overview of the plug ins that are available for download and installation see figure 1 2 Manage Plug ins and Resources o W o Manage Plug ins Download Plug ins Manage Resources Download Resources Bookmark Navigator g version 1 03 g no E Additional allignments With this extension you can bookmark elements in the Navigation Area Version
57. scription optional sequence Only nucleotide sequence Rich information incl annotations Rich information incl annotations Including chromatograms Simple format name amp description Only sequence no name Including chromatograms Including chromatograms Rich information only proteins Annotations in tab delimited text for mat Archives in rich format Special import full database Selected files in CLC format Contained files folder structure APPENDIX D FORMATS FOR IMPORT AND EXPORT 173 D 1 2 Alignment formats File type Suffix Import Export Description Aligned fasta fa X X pa pe fasta based format with for CLC cle X X Rich format including all information Clustal Alignment aln X X GCG Alignment msf X X Nexus nXS nexus X X Phylip Alignment phy X X Zip export Zip X Selected files in CLC format Zip import zip gzip tar X Contained files folder structure D 1 3 Tree formats File type CLC Newick Nexus Zip export Zip import Suffix clc nwk NXS NeEXUS Zip Zip gzip tar D 1 4 Miscellaneous formats File type BLAST Database CLC CSV Excel GFF mmCIF PDB Tab delimited Text Zip export Zip import Suffix phr nhr clc CSV XIs xIsx off cif pdb txt txt Zip Zip gzip tar Import Export X X X X X X X X Import Export X X Description Rich format including all information Selected files in CLC format Contained files folder st
58. to adjust how to handle the results see section 8 1 If not click Finish This will open a new view in the View Area displaying the shuffled sequence The new sequence is not saved automatically To save the sequence drag it into the Navigation Area or press ctrl S S on Mac to activate a save dialog 11 2 Sequence statistics CLC Sequence Viewer can produce an output with many relevant statistics for protein sequences Some of the statistics are also relevant to produce for DNA sequences Therefore this section deals with both types of statistics The required steps for producing the statistics are the same To create a statistic for the sequence do the following select sequence s Toolbox in the Menu Bar General Sequence Analysis 5 Create Sequence Statistics This opens a dialog where you can alter your choice of sequences which you want to create statistics for You can also add sequence lists Note You cannot create statistics for DNA and protein sequences at the same time When the sequences are selected click Next This opens the dialog displayed in figure 11 3 F q Create Sequence Statistics ES 1 Select sequences of same UE 2 Set parameters Layout Individual statistics layout Comparative statistics layout Background distribution For proteins Include background distribution of amino acids 2 A Previous gt Next Y Erin X Cancel Figure
59. tree which assign the highest probability to the data Bayesian inference The objective of Bayesian phylogenetic inference is not to infer a single correct phylogeny but rather to obtain the full posterior probability distribution of all possible phylogenies This is obtained by combining the likelihood and the prior probability distribution of evolutionary parameters The vast number of possible trees means that bayesian phylogenetics must be performed by approximative Monte Carlo based methods Larget and Simon 1999 Yang and Rannala 1997 15 2 4 Interpreting phylogenies Bootstrap values A popular way of evaluating the reliability of an inferred phylogenetic tree is bootstrap analysis The first step in a bootstrap analysis is to re sample the alignment columns with replacement l e in the re sampled alignment a given column in the original alignment may occur two or more times while some columns may not be represented in the new alignment at all The re sampled alignment represents an estimate of how a different set of sequences from the same genes and the same species may have evolved on the same tree If a new tree reconstruction on the re sampled alignment results in a tree similar to the original CHAPTER 15 PHYLOGENETIC TREES 159 one this increases the confidence in the original tree If on the other hand the new tree looks very different it means that the inferred tree is unreliable By re sampling a number of times
60. what you have done In this way you are able to document and reproduce previous operations This can be useful in several situations It can be used for documentation purposes where you can specify exactly how your data has been created and modified It can also be useful if you return to a project after some time and want to refresh your memory on how the data was created Also if you have performed an analysis and you want to reproduce the analysis on another element you can check the history of the analysis which will give you all parameters you set This chapter will describe how to use the History functionality of CLC Sequence Viewer 1 1 Element history You can view the history of all elements in the Navigation Area except files that are opened in other programs e g Word and pdf files The history starts when the element appears for the first time in CLC Sequence Viewer To view the history of an element Select the element in the Navigation Area Show 42 in the Toolbar History Lil or If the element is already open History Lil at the bottom left part of the view This opens a view that looks like the one in figure 7 1 When opening an element s history is opened the newest change is submitted in the top of the view The following information is available e Title The action that the user performed e Date and time Date and time for the operation The date and time are displayed according 84 CHAPTER 7 HISTORY
61. which makes it possible for anybody with a knowledge of programming in Java to develop plug ins The plug ins are fully integrated with the CLC Workbenches and the Viewer and provide an easy way to customize and extend their functionalities In April 2012 CLC Protein Workbench CLC DNA Workbenchand CLC RNA Workbench were discontinued and all customers with an valid license were offered to upgrade to CLC Main Workbench All our software will be improved continuously If you are interested in receiving news about updates you should register your e mail and contact data on http www clcbio com if you haven t already registered when you downloaded the program 1 4 1 New program feature request The CLC team is continuously improving the CLC Sequence Viewer with our users interests in mind Therefore we welcome all requests and feedback from users and hope suggest new features or more general improvements to the program on support clcbio com 1 4 2 Report program errors CLC bio is doing everything possible to eliminate program errors Nevertheless some errors might have escaped our attention If you discover an error in the program you can use the Report a Program Error function in the Help menu of the program to report it In the Report a Program Error dialog you are asked to write your e mail address optional This is because we would like to be able to contact you for further information about the error or for helping you with the
62. you check this option double clicking a file with a clc extension will open the CLC Sequence Viewer e Wait for the installation process to complete choose whether you would like to launch CLC Sequence Viewer right away and click Finish When the installation is complete the program can be launched from your Applications folder or from the desktop shortcut you chose to create If you like you can drag the application icon to the dock for easy access 1 2 4 Installation on Linux with an installer Navigate to the directory containing the installer and execute it This can be done by running a command similar to sh CLCSequenceViewer 6 JRE sh If you are installing from a CD the installers are located in the linux directory Installing the program is done in the following steps e On the welcome screen click Next e Read and accept the License agreement and click Next e Choose where you would like to install the application and click Next For a system wide installation you can choose for example opt or usr local If you do not have root privileges you can choose to install in your home directory e Choose where you would like to create symbolic links to the program DO NOT create symbolic links in the same location as the application Symbolic links should be installed in a location which is included in your environment PATH For a system wide installation you can choose for example usr local bin If you do not have root pr
63. 11 3 Setting parameters for the sequence statistics The dialog offers to adjust the following parameters e Individual statistics layout If more sequences were selected in Step 1 this function generates separate statistics for each sequence CHAPTER 11 GENERAL SEQUENCE ANALYSES 11 e Comparative statistics layout If more sequences were selected in Step 1 this function generates statistics with comparisons between the sequences You can also choose to include Background distribution of amino acids If this box is ticked an extra column with amino acid distribution of the chosen species is included in the table output The distributions are calculated from UniProt www uniprot org version 6 0 dated September 13 2005 Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish An example of protein sequence statistics is shown in figure 11 4 1 Protein statistics 1 1 Sequence information Organism Mus musculus Name C haemoglobin beta h0 chain Mus musculus Modification Date 18 APR 2005 Weight 16 412 kDa 1 2 Half life N terminal aa Half life mammals Half life yeast Half life E Coli Figure 11 4 Comparative sequence statistics Nucleotide sequence statistics are generated using the same dialog as used for protein sequence statistics However the output of Nucleotide sequence statistics is less extensive than that of the protein sequence statistics Note The h
64. 9 Annotations introduction to 98 overview of 101 show hide 99 table of 101 types of 99 view on sequence 99 viewing 99 Antigenicity 163 Append wildcard search 111 Arrange layout of sequence 23 views in View Area 4 Assembly 161 Atomic composition 120 Audit 58 Backup 6 Batch edit element properties 42 Batch processing log of 88 Bibliography 1 9 Bioinformatic data export 5 formats 70 171 BLAST 162 database file format 173 Bootstrap values 158 Browser import sequence from 2 Bug reporting 13 CDS translate to protein 95 Cheap end gaps 145 ChIP Seq analysis 161 cif file format 1 3 Circular view of sequence 96 162 Clc file format 76 1 3 CLC Standard Settings 64 CLC Workbenches 12 CLC file format 172 173 associating with CLC Sequence Viewer 10 Clone Manager file format 172 Cloning 162 165 181 INDEX Close view 45 Clustal file format 1 3 Coding sequence translate to protein 95 col file format 1 3 Color residues 148 Comments 103 Common name batch edit 42 Compare workbenches 161 Configure network 18 Consensus sequence 146 163 open 14 7 Conservation 147 graphs 163 Contact information 9 Contig 161 Copy 83 elements in Navigation Area 39 into sequence 96 search results GenBank 113 sequence 104 105 sequence selection 126 text selection 104 Cpf file format 61 chp file format 1 3 Create alignment 143 enzyme li
65. AND GRAPHICS 9 6 3 2 Save location and file formats In this step you can choose name and save location for the graphics file see figure 6 11 G Export Graphics x 1 Output options MEA 2 Save in file Lookin BE Desktop gt PEE ey Recent Items Desktop Documents Computer A gt Network Files of type Portable Document Format pdf Directory C Users smoensted Desktop Name ATP8al pdf ada Cerme gt rt Le Xena Figure 6 11 Location and name for the graphics file CLC Sequence Viewer supports the following file formats for graphics export Format Suffix Type Portable Network Graphics png bitmap JPEG Jpg bitmap Tagged Image File tif bitmap PostScript ps vector graphics Encapsulated PostScript eps vector graphics Portable Document Format pdf vector graphics Scalable Vector Graphics SVg vector graphics These formats can be divided into bitmap and vector graphics The difference between these two categories is described below Bitmap images In a bitmap image each dot in the image has a specified color This implies that if you zoom in on the image there will not be enough dots and if you zoom out there will be too many In these cases the image viewer has to interpolate the colors to fit what is actually looked at A bitmap image needs to have a high resolution if you want to zoom in This format is a good choice for storing images without large shapes e g dot plots It is a
66. Area Note A sequence is not saved until the View displaying the sequence is closed When that happens a dialog opens Save changes of sequence x Yes or No The sequence can also be saved by dragging it into the Navigation Area lt is possible to select more sequences and drag all of them into the Navigation Area at the same time CHAPTER 10 DATA DOWNLOAD 113 Download GenBank search results using right click menu You may also select one or more sequences from the list and download using the right click menu see figure 10 2 Choosing Download and Save lets you select a folder where the sequences are saved when they are downloaded Choosing Download and Open opens a new view for each of the selected sequences Definition A File Edit View Toolbox Show T F F F F 4 Download and Open lc y HE Download and Save Open at NCBI KI Figure 10 2 By right clicking a search result it is possible to choose how to handle the relevant sequence Copy paste from GenBank search results When using copy paste to bring the search results into the Navigation Area the actual files are downloaded from GenBank To copy paste files into the Navigation Area select one or more of the search results Ctrl C 36 C on Mac select a folder in the Navigation Area Ctrl V Note Search results are downloaded before they are saved Downloading and saving several files may take some time However since the process run
67. Ball Suppliers GE Healthcare Qbiogene American Allied Biochemical Inc Nippon Gene Co Ltd Takara Bio Inc New England Biolabs Toyobo Biochemicals Molecular Biology Resources Promega Corporation EURx Ltd Figure 13 18 Showing additional information about an enzyme like recognition sequence or a list of commercial vendors ES all enzymes O Bs nos 1 h e 53 235 Rows 1362 Table of restriction enzymes Filter Name Recognition sequence Overhang Suppliers Methylation sensitivity Star activity v Column width EcoRy gatatc Blunt GE Healthc N6 methyladenosine Yes Automatic v BglII agatct 5 gate GE Healthc N4 methylcytosine No A Sall ategac 5 tcga GE Healthc N6 methyladenosine Yes Name xhol ctcgag 5 toga GE Healthc N6 methyladenosine Mo Recognition sequence HindIII aagctt 5 agct GE Healthc N 6 methyladenosine Yes Xbal tctaga 5 ctag GE Healthc N methyladenosine Yes Overhang EcoRI gaattc 5 aatt GE Healthc N6 methyladenosine Yes Suppliers PstI ctgcag 3 tgca GE Healthc N6 methyladenosine Yes Ae v Methylation sensitivit BamHI ggatcc 5 gate GE Healthc N4 methylcytosine Yes y aey Clal atcgat 5 cg GE Healthc N6 methyladenosine No C Recognizes palindrome NotI gegaccge 5 gacc GE Healthc N4 methylcytosine No Star activity NdeI catatg 5 ta GE Healthc N methyladenosine Yes Sacl gagcte 3 agct GE Healthc 5 methylcytosine Yes C
68. CES 109 Below these options you can see the number of sequences that will be extracted Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish Chapter 10 Data download Contents 10 1 GenBank search lt gt m5 rara dg ee a we 110 10 1 1 GenBank search options aa 110 10 1 2 Handling of GenBank search results 112 10 1 3 Save GenBank search parameters 0 0 ee a ee ee 113 CLC Sequence Viewer allows you to search the for sequences on the Internet You must be online when initiating and performing searches in NCBI 10 1 GenBank search This section describes searches for sequences in GenBank the NCBI Entrez database The NCBI search view is opened in this way figure 10 1 Download Search for Sequences at NCBI g or Ctrl B 3 Bon Mac This opens the following view 10 1 1 GenBank search options Conducting a search in the NCBI Database from CLC Sequence Viewer corresponds to conducting the search on NCBI s website When conducting the search from CLC Sequence Viewer the results are available and ready to work with straight away You can choose whether you want to search for nucleotide sequences or protein sequences As default CLC Sequence Viewer offers one text field where the search parameters can be entered Click Add search parameters to add more parameters to your search Note The search is a and search
69. CLC Genomics Workbench m Data handling Viewer Main Genomics Add multiple locations to Navigation Area E Share data on network drive E E Search all your data o E Assembly of sequencing data Viewer Main Genomics Advanced contig assembly E Importing and viewing trace data E E Trim sequences u E Assemble without use of reference sequence E E Map to reference sequence E Assemble to existing contig E y Viewing and edit contigs E Tabular view of an assembled contig easy E Es data overview Secondary peak calling E a Multiplexing based on barcode or name E E 161 APPENDIX A MORE FEATURES Next generation Sequencing Data Analysis Viewer Import of 454 Illumina Genome Analyzer SOLID and Helicos data Reference assembly of human size genomes De novo assembly SNP DIP detection Graphical display of large contigs Support for mixed data assembly Paired data support RNA Seq analysis Expression profiling by tags ChIP Seq analysis Expression Analysis Viewer Import of Illumina BeadChip Affymetrix GEO data Import of Gene Ontology annotation files Import of Custom expression data table and Custom annotation files Multigroup comparisons Advanced plots scatter plot volcano plot box plot and MA plot Hierarchical clustering Statistical analysis on count based and gaus Sian data Annotation tests Principal component analysis PCA Hierarchical clustering and heat maps Analysis of RNA Seq Tag profiling samples Mol
70. Change element names o 41 3 1 7 Delete elements ee ee 42 3 1 8 Show folder elements in a table ee eee 42 3 2 View rea a o 44 SL DNC aicese dr aa aaa Bean ee A A A a 44 3 2 2 Show element in another view 0 45 3 2 3 CloseviewS he 45 3 2 4 Save changes in a view ee 2 0 46 3 2 0 MO MEDO EE 4T 3 2 6 Arrange views in View rea 4T sl Sde Panel pras ee eh ee ee ORE ae Do a 49 3 3 Zoom and selection in View Area lt 50 al ON lt lt redada asar sa 50 de POU cb bene ede Gb AE AA DE a Soe E a 50 Su PM sides een he Owe Bho ek E 51 Pe SO a he Se aaa 51 Sea We es eee a ee eae daa ee TILES E RE 51 33 0 Meto oa a ea he AS EA Ra ee E E 51 3 3 7 Changing compactness ee ee ee es 51 3 4 Toolbox and Status Bar lt 51 3 4 1 Processes o o rara 52 d o TOON ce i te eee ee aa AAA 52 C435 Slatus Bar si scara dess ird inad taa eaa RA 53 3 5 MoMA Esmirna ee RE E oe 53 CHAPTER 3 USER INTERFACE 37 3 5 1 Create Workspace ici ek cee td de ebb ss 53 3 5 2 Select Workspace aoaaa a a ee a 53 3 5 3 Delete Workspace a 54 3 6 List of shortcuts 2 oia aaa aaa we Oe es 54
71. Choose if you would like to associate clc files to CLC Sequence Viewer If you check this option double clicking a file with a clc extension will open the CLC Sequence Viewer e Wait for the installation process to complete choose whether you would like to launch CLC Sequence Viewer right away and click Finish When the installation is complete the program can be launched from the Start Menu or from one of the shortcuts you chose to create 1 2 3 Installation on Mac OS X Starting the installation process is done in one of the following ways If you have downloaded an installer Locate the downloaded installer and double click the icon The default location for downloaded files is your desktop If you are installing from a CD Insert the CD into your CD ROM drive and open it by double clicking on the CD icon on your desktop Launch the installer by double clicking on the CLC Sequence Viewer icon Installing the program is done in the following steps e On the welcome screen click Next e Read and accept the License agreement and click Next e Choose where you would like to install the application and click Next CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 11 e Choose if CLC Sequence Viewer should be used to open CLC files and click Next e Choose whether you would like to create desktop icon for launching CLC Sequence Viewer and click Next e Choose if you would like to associate clc files to CLC Sequence Viewer If
72. EE ME ES ES E Hw 15 1 6 2 Uninstalling plug ins e 16 LOS Updating DME osos daras Aa 17 1 0 4 Resources 4 4 4 17 1 7 Network configuration 2 4 2 4 18 1 8 The format of the user manual 2 00 2 eee eee ee 19 1 8 1 Textformats e aww ke Ea Ae 20 Welcome to CLC Sequence Viewer a software package supporting your daily bioinformatics work We strongly encourage you to read this user manual in order to get the best possible basis for working with the software package This software is for research purposes only CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 9 1 1 Contact information The CLC Sequence Viewer is developed by CLC bio A S Science Park Aarhus Finlandsgade 10 12 8200 Aarhus N Denmark hetp www clco1o com VAT no DK 28 30 50 87 Telephone 45 70 22 55 09 Fax 45 70 22 55 19 E mail info clcbio com If you have questions or comments regarding the program you are welcome to contact our Support function E mail support clcbio com 1 2 Download and installation The CLC Sequence Viewer is developed for Windows Mac OS X and Linux The software for either platform can be downloaded from http www clcbio com download 1 2 1 Program download The program is available for download on http www clcbio com download Before you download the program yo
73. ETAS m fija CLC Data T Seg My Folder HUMDINUC ACAAATTGATTAATGATAGTGCTATCC RR A HUMDINUC v Sequence layout HR Recycle bin 1 E Spacing HUMDINUC TCTTGCATTTAGAGTTTAACTGGTACC No spacing 60 80 No wrap HUMDINUC TACTTCCAAAAGGGAAACAGAATTAGA Auto wrap 100 O Fixed wrap HUMDINUC AAAGAAAATGTGGTTCCAGAAAGGAAG 120 C Double stranded Dol Y HUMDINUC AAAAAGAACACACACACACACACACAC V Numbers on sequences e QE and Trees 140 160 Relative to 1 ES GA General Sequence Analyses 7 WEST EA Nucleotide Analyses HUMDINUC ACACACACACACACACACTGCATCTGC seen caca lag Restriction Sites 180 Follow selection Bh Database Search ha gt 4S Processes Toolbox _ _ _ _ _ _ _ _ _ Idle 1 element s are selected Figure 2 2 The HUMDINUC file is imported and opened 2 2 Tutorial View sequence This brief tutorial will take you through some different ways to display a sequence in the program The tutorial introduces zooming on a sequence dragging tabs and opening selection in new view We will be working with the sequence called pcDNAS atp8a1 located in the Cloning folder in the Example data Double click the sequence in the Navigation Area to open it The sequence is displayed with annotations above it See figure 2 3 As default CLC Sequence Viewer displays a sequence with annotations colored arrows on the sequence like the green promoter region annotation i
74. File button at the bottom of the dialog This will open a dialog where you can browse for the plug in The plug in file should be a file of the type cpa When you close the dialog you will be asked whether you wish to restart the CLC Sequence Viewer The plug in will not be ready for use before you have restarted 1 6 2 Uninstalling plug ins Plug ins are uninstalled using the plug in manager Help in the Menu Bar Plug ins and Resources or Plug ins 5 in the Toolbar This will open the dialog shown in figure 1 3 The installed plug ins are shown in this dialog To uninstall Click the plug in Uninstall CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 1 Manage Plug ins and Resources o o Manage Plug ins Download Plug ins Manage Resources Download Resources Additional Alignments CLC bio support clcbio com Version 1 02 Perform alignments with many different programs From within the workbench ClustalW Windows Mac Linux Muscle Windows Mac Linux T Coffee Mac Linux MAFFT Mac Linux Kalign Mac Linux Annotate with GFF file CLC bio support clcbio com Version 1 03 Using this plug in it is possible to annotate a sequence from list of annotations found in a GFF file Located in the Toolbox Extract Annotations CLC bio supportiocicbio com version 1 02 Extracts annotations from one or more sequences The result is a sequence list containing sequences covered by the specified annotations
75. HI p THIO pATHI 1 pATH2 pATH3 pELCATZ pELCATS pBLCATS pELCATE pBR 322 pBR325 Modified elements in one go Modifie smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted smoensted Description M13mp8 pLICS M13mpa plca Cloning vector Cloning vector Cloning vector p471153 cloning Expression tel Cloning vector Cloning vector Cloning vector Cloning vector Plasmid pELCA Plasmid pBLCA Cloning vector Cloning vector Cloning vector pBR325 cloning Move to Recycle Bin Length Linear feed Linear 7599 Linear 3941 Linear 4245 Circular 6000 Linear 3658 Circular 3779 Linear 3771 Circular 3472 Linear 3753 Linear 3763 Circular 4496 Linear 4344 Linear 4404 Linear 4256 Linear 4361 Circular 5996 Circular 43 A ESILE Wie Sete na ah A e a 7 Column width Show column Type Name Modified Modified by Description Length Latin Name Taxonomy Common Mame Linear Select All Deselect All Figure 3 4 Viewing the elements in a folder In figure 3 5 you can see an example where the common name of five sequence are renamed in one go In this example a dialog with a text field will be shown letting you enter a new common name for these five sequences Type RRRRRRRRRRRRRR undo Mame pATHIO pA
76. LOG 85 Ch Reference contig O NI A LIL cL Moved aligned region Wed Jan 21 10 40 45 CET 2009 User smoensted Parameters Read name Fuda Old aliqned region 139 955 New aligned region 37 900 gt Comments Edit Wo Comment Deleted selection Wed Jan 21 10 39 57 CET 2009 User smoensted Parameters Region 977 Modified element Rews Comments Edit No Comment Assembled sequences to reference Wed Jan 21 10 38 50 CET 2009 Figure 7 1 An element s history to your locale settings see section 4 1 e User The user who performed the operation If you import some data created by another person in a CLC Workbench that persons name will be shown e Parameters Details about the action performed This could be the parameters that was chosen for an analysis e Origins from This information is usually shown at the bottom of an element s history Here you can see which elements the current element origins from If you have e g created an alignment of three sequences the three sequences are shown here Clicking the element selects it in the Navigation Area and clicking the history link opens the element s own history e Comments By clicking Edit you can enter your own comments regarding this entry in the history These comments are saved 7 1 1 Sharing data with history The history of an element is attached to that element which means t
77. Modules 15 Molecular weight 118 Motif search 164 Mouse modes 50 Move content of a view 51 elements in Navigation Area 39 sequences in alignment 149 msf file format 173 Multiple alignments 150 163 Multiselecting 39 Name 103 Navigation Area 3 illustration 37 NCBI 110 search tutorial 28 Negatively charged residues 120 Neighbor Joining algorithm 157 Neighbor joining 164 Nested PCR primers 164 Network configuration 18 Never show this dialog again 58 New feature request 13 folder 39 folder tutorial 22 sequence 104 New sequence create from a selection 95 Newick file format 1 3 Next Generation Sequencing 161 nexus file format 1 3 Nexus file format 172 173 NGS 161 nhr file format 1 3 NHR file format 173 Non standard residues 93 Nucleotides UIPAC codes 177 Numbers on sequence 91 nwk file format 1 3 nxs file format 1 3 094 file format 1 3 INDEX Open consensus sequence 14 from clipboard 72 Open reading frame determination 128 Open ended sequence 128 Order primers 164 ORF 128 Organism 103 Origins from 85 Overhang find restriction enzymes based on 134 136 141 pa4 file format 1 3 Page heading 69 Page number 69 Page setup 08 Parameters search 110 Partition function 164 Paste text to create a new sequence 2 Paste copy 83 Pattern discovery 164 PCR primers 164 pdb file format 173 seg file format 1 3 PDB file for
78. NCES 105 q Create Sequence Es 1 Enter Sequence Data casais Name P70704 Common name house mouse Latin name Musmusculus Type 20 DNA OC RNA S O Protein Description Probable phospholipid transporting ATPase IA Sequence required 180 1 mptmrrtvse irsraegyek tddvsektsl adqeevrtif ingpqltkfc nnh vstakyn 61 vitflprfly sgfrraansf flfiallqqi pdvsptgryt tlvpllfil a vaaikeiied 121 ikrhkadnav nkkqtqvlrn gaweivhwek vnvgdiviik gkeyipadt v llsssepqam EA Figure 9 11 Creating a sequence e Latin name The Latin name for the species e Type Select between DNA RNA and protein e Circular Specifies whether the sequence is circular This will open the sequence in a circular view as default applies only to nucleotide sequences e Description A description of the sequence e Keywords A set of keywords separated by semicolons e Comments Your own comments to the sequence e Sequence Depending on the type chosen this field accepts nucleotides or amino acids Spaces and numbers can be entered but they are ignored when the sequence is created This allows you to paste Ctrl V on Windows and 38 V on Mac in a sequence directly from a different source even if the residue numbers are included Characters that are not part of the IUPAC codes cannot be entered At the top right corner of the field the number of residues are counted The counter does not count spaces or numbers Clicking Finish o
79. Not Applicable Vertical Pagecount Not Applicable Header Text Footer Text Show Pagenumber Yes Output Options Print visible area Print whole view X Cancel Help 23 Preview ED Page Setup Figure 5 1 The Print dialog 5 1 Selecting which part of the view to print In the print dialog you can choose to e Print visible area or e Print whole view These options are available for all views that can be zoomed in and out In figure 5 2 is a view of a circular sequence which is zoomed in so that you can only see a part of it pcDNA3 atp8a1 9118 bp e O E El 11 Eo HY Figure 5 2 A circular sequence as it looks on the screen When selecting Print visible area your print will reflect the part of the sequence that is visible in the view The result from printing the view from figure 5 2 and choosing Print visible area can be seen in figure 5 3 3 MV promoter T7 Promoter tp8a1 pcDNA3 atp8a1 9118 bp Figure 5 3 A print of the sequence selecting Print visible area On the other hand if you select Print whole view you will get a result that looks like figure 5 4 This means that you also print the part of the sequence which is not visible when you have zoomed in CHAPTER 5 PRINTING 68 Figure 5 4 A print of the sequence selecting Print whole view The whole sequence is shown even though the view is zoomed in on a part of the sequence 5 2 Page setup
80. Numbers on sequences E Sequence logo Mo EE i i re esssEkAS ePRARREGLE oeFacesmvo nriDapFLoe DerseveGSa Relative to 1 60 80 100 v Le T 029449 MESRAEG DDWSEK TSEADOEEI ae entras ATP8al ISA NS DDESEK HH He Q9NTIZ somstacon H AP i v Lock labels 094296 PEGS sni E Sequence label P39524 ont ewan H HH Y P57792 Q9SX33 HHH C Show selection boxes dica Te e 3 DV EK TSLXDQXELX C Identical residues as dots Conservation E Annotation layout R i 3bits 5 Annotation types equence to o E q E sa suv IRIR ZE Tias Eerio rPtsD ul eERDUSER rePasberes y y Residue coloring Nonstandard residues v ER Figure 2 5 The protein alignment as it looks when you open it with background color according to the Rasmol color scheme and automatically wrapped Annotation layout Show annotations Position Next to sequence Offset More offset Label Stacked ka Show arrows Use gradients Annotation types DD active site MN 4 Gene E DDD Metal binding site ES Modified site ES tWP binding E Protein DD Region Doe Select All Deselect All Figure 2 6 The Annotation Layout and the Annotation Types in the Side Panel This means that you would have to perform the changes again next time you open the alignment To save the changes to the Side Panel click the Save Restore Settings button 5 at the top of the Side Panel and click Save Settings see figure 2 8 This will open the dialog s
81. ON TO CLC SEQUENCE VIEWER 15 e New sequence Opens a dialog which allows you to enter your own sequence e Read tutorials Opens the tutorials menu with a number of tutorials These are also available from the Help menu in the Menu bar Below these three quick start shortcuts you will see a text Looking for more features Clicking this text will take you to a page on http www clcbio com where you can read more about how to get more functionalities into CLC Sequence Viewer 1 5 2 Import of example data It might be easier to understand the logic of the program by trying to do simple operations on existing data Therefore CLC Sequence Viewer includes an example data set When downloading CLC Sequence Viewer you are asked if you would like to import the example data set If you accept the data is downloaded automatically and saved in the program If you didn t download the data or for some other reason need to download the data again you have two options You can click Install Example Data 2 in the Help menu of the program This installs the data automatically You can also go to http www clcbio com download and download the example data from there lf you download the file from the website you need to import it into the program See chapter 6 for more about importing data 1 6 Plug ins When you install CLC Sequence Viewer it has a standard set of features However you can upgrade and customize the program using a variety
82. Only the one at front is visible Little offset The annotations are piled on top of each other but they have been offset a little CHAPTER 9 VIEWING AND EDITING SEQUENCES 101 More offset Same as above but with more spreading Most offset The annotations are placed above each other with a little space between This can take up a lot of space on the screen e Label The name of the annotation can shown as a label Additional information about the sequence is shown if you place the mouse cursor on the annotation and keep it still No labels No labels are displayed On annotation The labels are displayed in the annotation s box Over annotation The labels are displayed above the annotations Before annotation The labels are placed just to the left of the annotation Flag The labels are displayed as flags at the beginning of the annotation Stacked The labels are offset so that the text of all labels is visible This means that there is varying distance between each sequence line to make room for the labels e Show arrows Displays the end of the annotation as an arrow This can be useful to see the orientation of the annotation for DNA sequences Annotations on the negative strand will have an arrow pointing to the left e Use gradients Fills the boxes with gradient color In the Annotation Types group you can choose which kinds of annotations that should be displayed This group lists all the types of
83. PERHIBA Found 5 reading Frames Fri Now 17 PERH1BB Found 7 reading frames Fri Now 17 PERH2BA Found 4 reading Frames Fri Nov 17 PERH2BB Found 7 reading frames Fri Now 17 PERH2BD Found 8 reading frames Fri Now 17 PERH3BA Found 3 reading frames Fri Now 17 PERH3BC Found 7 reading frames Fri Nov 17 Figure 8 4 An example of a batch log when finding open reading frames The log will either be saved with the results of the analysis or opened in a view with the results depending on how you chose to handle the results Part II Bioinformatics 89 Chapter 9 Viewing and editing sequences Contents 9 1 View sequence os cis acordar 9 1 1 Sequence settings in Side Panel 9 1 2 Restriction sites in the Side Panel 9 1 3 Selecting parts of the sequence 9 1 4 Editing the sequence 9 1 5 Sequence region types 9 2 Circular DNA 9 2 1 Using split views to see details of the circular molecule 9 2 2 Mark molecule as circular and specify starting point 9 3 Working with annotations 9 3 1 Viewing annotations 9 3 2 Removing annotations 9 4 Element information 9 5 View as text 9 6 Creating anewsequence 0 0088 ee eee ee ee 9 7 Sequence Lists 9 7 1 Graphical view of sequence lists 9 7 2 Sequence list table Lio Extract sequences CLC Sequence Viewer offers five different ways of viewing and editing single sequences as described in the first five sections of this chapter Furthermore this
84. Pu Q95K33 RNA secondary Sequencing dat 4 mI p Qy zenter search term gt 4 2 f Previous gt Next Finish 2 Cancel Figure 11 5 Selecting two sequences to be joined If you have selected some sequences before choosing the Toolbox action they are now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences from the selected elements Click Next opens the dialog shown in figure 11 6 r W Join Sequences 83 1 Select sequences of same ms parameters type 2 Set parameters Set order of concatenation top first As 094296 Ast P39524 T A Previous gt Next Y Eich 2 Cancel Figure 11 6 Setting the order in which sequences are joined CHAPTER 11 GENERAL SEQUENCE ANALYSES 123 In step 2 you can change the order in which the sequences will be joined Select a sequence and use the arrows to move the selected sequence up or down Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish The result is shown in figure 11 7 a em Figure 11 7 The result of joining sequences is a new sequence containing the annotations of the joined sequences they each had a HBB annotation Joined Sequence Chapter 12 Nucleotide analyses Contents 12 1 Convert DNA to RNA 3k cea te ee cs ore asas AAA 124 12 2 Convert RNA to DNA lt 0 lt lt lt
85. RT EXPORT OF DATA AND GRAPHICS 11 Import using the import dialog To start the import using the import dialog click Import amp in the Toolbar This will show a dialog similar to figure 0 1 You can change which kind of file types that should be shown by selecting a file format in the Files of type box ES wr mau a 1 Choose files to import TRE Hd Rare rar pa in EE Desktop File name Files of type All Files v Options Automatic import Force import as type ACE files ace Force import as external file s Figure 6 1 The import dialog Next select one or more files or folders to import and click Next This allows you to select a place for saving the result files If you import one or more folders the contents of the folder is automatically imported and placed in that folder in the Navigation Area If the folder contains subfolders the whole folder structure iS imported In the import dialog figure 6 1 there are three import options Automatic import This will import the file and CLC Sequence Viewer will try to determine the format of the file The format is determined based on the file extension e g SwissProt files have swp at the end of the file name in combination with a detection of elements in the file that are specific to the individual file formats If the file type is not recognized it will be imported as an external file In most cases automatic import will yield a successful result but
86. TH11 p TH p TH3 pELCATZ PELCATS PELCATS Modified Mimp Tue Jun Mimp9 Tue Jun Note This information is directly saved and you cannot Modifi smoensted MismpB smoensted Mismp Delete Descri Tue Jun Tue Jun Tue Jun Tue Jun Tue Jun Tue Jun Tue Jun Edit p m g a Pale smoensted smoensted smoensted smoensted smoensted TTL Tn NE Clonin Clonin Plasmit Plasmii Cloni Length feed Fogg Mame Description Latin Mame Taxonomy Common Mame Linear Common Mame Ena Hion gt AT Him Li eee Ro ER Eiis Li g Move to Recycle Bin Figure 3 5 Changing the common name of five sequences CHAPTER 3 USER INTERFACE 44 3 2 View Area The View Area is the right hand part of the screen displaying your current work The View Area may consist of one or more Views represented by tabs at the top of the View Area This is illustrated in figure 3 6 art P6B063 ar P6B225 3 cat Pasoss O mer PESO E gt E PF68225 WDEVGGEALI P68046 DEVGGEALGF PF68225 RLLVWYPWT 1 P656046 LLVWWYPWT GF PF68225 RFFESFGDL yt PF68046 FFDSFGDLSE IMAAM 2 a Figure 3 6 A View Area can enclose several views each view is indicated with a tab see right view which shows protein P68225 Furthermore several views can be shown at the same time in this example four views are displayed The tab concept is cen
87. The extinction coefficient values of the three important amino acids at different wavelengths are found in Gill and von Hippel 1989 Knowing the extinction coefficient the absorbance optical density can be calculated using the following formula Ext Protet Absorbance Protein o MO olecular weig Two values are reported The first value is computed assuming that all cysteine residues appear as half cystines meaning they form di sulfide bridges to other cysteines The second number assumes that no di sulfide bonds are formed Atomic composition Amino acids are indeed very simple compounds All 20 amino acids consist of combinations of only five different atoms The atoms which can be found in these simple structures are Carbon Nitrogen Hydrogen Sulfur Oxygen The atomic composition of a protein can for example be used to calculate the precise molecular weight of the entire protein CHAPTER 11 GENERAL SEQUENCE ANALYSES 121 Total number of negatively charged residues Asp Glu At neutral pH the fraction of negatively charged residues provides information about the location of the protein Intracellular proteins tend to have a higher fraction of negatively charged residues than extracellular proteins Total number of positively charged residues Arg Lys At neutral pH nuclear proteins have a high relative percentage of positively charged amino acids Nuclear proteins often bind to the negatively charged DNA which may regu
88. The molecular weight is the mass of a protein or molecule The molecular weight is simply calculated as the sum of the atomic mass of all the atoms in the molecule The weight of a protein is usually represented in Daltons Da A calculation of the molecular weight of a protein does not usually include additional posttransla tional modifications For native and unknown proteins it tends to be difficult to assess whether posttranslational modifications such as glycosylations are present on the protein making a calculation based solely on the amino acid sequence inaccurate The molecular weight can be determined very accurately by mass spectrometry in a laboratory CHAPTER 11 GENERAL SEQUENCE ANALYSES 119 Isoelectric point The isoelectric point pl of a protein is the pH where the proteins has no net charge The plis calculated from the pKa values for 20 different amino acids At a pH below the pl the protein carries a positive charge whereas if the pH is above pl the proteins carry a negative charge In other words pl is high for basic proteins and low for acidic proteins This information can be used in the laboratory when running electrophoretic gels Here the proteins can be separated based on their isoelectric point Aliphatic index The aliphatic index of a protein is a measure of the relative volume occupied by aliphatic side chain of the following amino acids alanine valine leucine and isoleucine An increase in the aliphatic
89. a pas a AAA 148 14 3 3 Delete residues and gaps lt lt lt 149 14 3 4 Move sequences up and down o ee ee ee a 149 14 3 5 Delete and rename sequences 149 14 4 Bioinformatics explained Multiple alignments lt lt lt lt lt 4 150 14 4 1 Use of multiple alignments lt lt eee 150 14 4 2 Constructing multiple alignments 0 0208 150 CLC Sequence Viewer can align nucleotides and proteins using a progressive alignment algorithm see section 14 4 or read the White paper on alignments in the Science section of http www clcbio com This chapter describes how to use the program to align sequences The chapter also describes alignment algorithms in more general terms 14 1 Create an alignment To create an alignment in CLC Sequence Viewer select sequences to align Toolbox in the Menu Bar Alignments and Trees E Create Alignment or select sequences to align right click any selected sequence Toolbox Alignments and Trees 1 Create Alignment EE This opens the dialog shown in figure 14 1 143 CHAPTER 14 SEQUENCE ALIGNMENT 144 a G Create Alignment ba 1 Select sequences of same selled sequences of same type type Projects Selected Elements 6 lof CLC Data Ae 094296 Example Data Se P39524 XxX ATP8al genomi e P57792 2 ATP8al mRNA ys Q29449 Sw ATPSal su QO9NTI2 H C
90. agues can reproduce your findings with adjusted parameters if desired To export with dependent files select the element in Navigation Area File in Menu Bar Export with Dependent Elements enter name of of the new file choose where to export to Save The result is a folder containing the exported file with dependent elements stored automatically in a folder on the desired location of your desk Export history To export an element s history select the element in Navigation Area Export ES select History PDF pdf choose where to export to Save The entire history of the element is then exported in pdf format The CLC format CLC Sequence Viewer keeps all bioinformatic data in the CLC format Compared to other formats the CLC format contains more information about the object like its history and comments The CLC format is also able to hold several elements of different types e g an alignment a graph and a phylogenetic tree This means that if you are exporting your data to another CLC Workbench you can use the CLC format to export several elements in one file and you will preserve all the information Note CLC files can be exported from and imported into all the different CLC Workbenches Backup If you wish to secure your data from computer breakdowns it is advisable to perform regular backups of your data Backing up data in the CLC Sequence Viewer is done in two ways e Making a backup of each of the folde
91. alyses also generate a table with results and for these analyses the last step looks like figure 8 3 g Find Open Reading Frames y 1 Select nucleotide sequences 2 Set parameters 3 Result handling Output options V Add annotation to sequence Y Create table Result handling o Open Save Log handling Make log oC emm Figure 8 3 Analyses which also generate tables In addition to the Open and Save options you can also choose whether the result of the analysis should be added as annotations on the sequence or shown on a table If both options are selected you will be able to click the results in the table and the corresponding region on the sequence will be selected CHAPTER 8 BATCHING AND RESULT HANDLING 38 If you choose to add annotations to the sequence they can be removed afterwards by clicking Undo in the Toolbar 8 1 2 Batch log For some analyses there is an extra option in the final step to create a log of the batch process see e g figure 8 3 This log will be created in the beginning of the process and continually updated with information about the results See an example of a log in figure 8 4 In this example the log displays information about how many open reading frames were found EX log O Rows 9 Log Filter Name Description Type Time 4738615 Found 10 reading Frames Fri Nov 17 HUMDINUC Found 5 reading Frames Fri Now 17
92. and Tooter oa ceca asada Dm 69 5 3 Print preview 4 1658 ee ie ee we a A A 69 CLC Sequence Viewer offers different choices of printing the result of your work This chapter deals with printing directly from CLC Sequence Viewer Another option for using the graphical output of your work is to export graphics See chapter 6 3 in a graphic format and then import it into a document or a presentation All the kinds of data that you can view in the View Area can be printed The CLC Sequence Viewer uses a WYSIWYG principle What You See Is What You Get This means that you should use the options in the Side Panel to change how your data e g a sequence looks on the screen When you print it it will look exactly the same way on print as on the screen For some of the views the layout will be slightly changed in order to be printer friendly It is not possible to print elements directly from the Navigation Area They must first be opened in a view in order to be printed To print the contents of a view select relevant view Print 5 in the toolbar This will show a print dialog see figure 5 1 In this dialog you can e Select which part of the view you want to print e Adjust Page Setup e See a print Preview window These three options are described in the three following sections 66 CHAPTER 5 PRINTING 67 a q Print Graphics zs Page Setup Parameters Orientation Portrait Paper Size A4 Horizontal Pagecount
93. area Export whole area 2 Previous gt Next Finis X Cancel Figure 6 7 Selecting to export whole view or to export only the visible area 6 3 1 Which part of the view to export In this dialog you can choose to e Export visible area or e Export whole view These options are available for all views that can be zoomed in and out In figure 6 8 is a view of a circular sequence which is zoomed in so that you can only see a part of it CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 18 O AY738515 O A HBD HBB i JE Figure 6 8 A circular sequence as it looks on the screen When selecting Export visible area the exported file will only contain the part of the sequence that is visible in the view The result from exporting the view from figure 6 8 and choosing Export visible area can be seen in figure 6 9 Figure 6 9 The exported graphics file when selecting Export visible area On the other hand if you select Export whole view you will get a result that looks like figure 6 10 This means that the graphics file will also include the part of the sequence which is not visible when you have zoomed in AY738615 180 bp Figure 6 10 The exported graphics file when selecting Export whole view The whole sequence is shown even though the view is zoomed in on a part of the sequence Click Next when you have chosen which part of the view to export CHAPTER 6 IMPORT EXPORT OF DATA
94. as become common practice to construct phylogenies based on molecular data known as molecular phylogeny The data is most commonly represented in the form of DNA or protein sequences but can also be in the form of e g restriction fragment length polymorphism RFLP Methods for constructing molecular phylogenies can be distance based or character based Distance based methods Two common algorithms both based on pairwise distances are the UPGMA and the Neighbor Joining algorithms Thus the first step in these analyses is to compute a matrix of pairwise distances between OTUs from their sequence differences To correct for multiple substitutions it is common to use distances corrected by a model of molecular evolution such as the Jukes Cantor model Jukes and Cantor 1969 UPGMA A simple but popular clustering algorithm for distance data is Unweighted Pair Group Method using Arithmetic averages UPGMA Michener and Sokal 1957 Sneath and Sokal 1973 This method works by initially having all sequences in separate clusters and continuously joining these The tree is constructed by considering all initial clusters as leaf nodes in the tree and each time two clusters are joined a node is added to the tree as the parent of the two chosen nodes The clusters to be joined are chosen as those with minimal pairwise distance The branch lengths are set corresponding to the distance between clusters which is calculated as the average distance b
95. at can be imported into CLC Sequence Viewer sequences can have annotations GenBank EMBL and Swiss Prot format e The result of a number of analyses in CLC Sequence Viewer are annotations on the sequence e g finding open reading frames and restriction map analysis Note Annotations are included if you export the sequence in GenBank Swiss Prot EMBL or CLC format When exporting in other formats annotations are not preserved in the exported file 9 3 1 Viewing annotations Annotations can be viewed in a number of different ways e As arrows or boxes in the sequence views Linear and circular view of sequences et 0 Alignments HE Graphical view of sequence lists e In the table of annotations jc e In the text view of sequences In the following sections these view options will be described in more detail In all the views except the text view annotations can be deleted This is described in the following sections View Annotations in sequence views Figure 9 6 shows an annotation displayed on a sequence CDS 20 HUMHBB GGCCCTGTTCTGATCATGGGCCCTTCCTAACACTGCATGACTACCTTA CDS HUMHBB TTCTTGTTAGGATCCAAGCAACGGATTCTGCTGGAGCTGTCGTTTTTT CDS 140 HUMHBB CTGGGTGTGTCTCCAACAAGTCCTGAGCACACATAACTGGAAACAATG Figure 9 6 An annotation showing a coding region on a genomic dna sequence The various sequence views listed in section 9 3 1 have different default settings for sho
96. ation 103 join 122 layout 91 lists 105 logo 163 new 104 region types 96 search 93 select 94 shuffle 114 statistics 116 view 90 view as text 104 view circular 96 view format 41 Sequencing data 161 Sequencing primers 164 Share data 161 Share Side Panel Settings 60 Shortcuts 54 Show results from a finished process 52 Show dialogs 58 Show hide Toolbox 52 Shuffle sequence 114 162 Side Panel INDEX tutorial 24 Side Panel Settings export 60 import 60 share with others 60 Side Panel location of 59 Signal peptide 163 Single base editing in sequences 96 Single cutters 133 SNP detection 161 Solexa see Illumina Genome Analyzer SOLID data 161 Sort sequences alphabetically 149 Sort folders 39 Source element 85 Species display name 41 Staden file format 172 Standard layout trees 155 Standard Settings CLC 64 Star activity 140 Start Codon 128 Start up problems 13 Statistics about sequence 162 protein 118 sequence 116 Status Bar 51 53 illustration 37 str file format 1 3 Structure scanning 104 Style sheet preferences 62 Support mail 9 svg format export 79 Swiss Prot file format 1 2 Swiss Prot TrEMBL 102 Swp file format 1 3 System requirements 12 Tab delimited file format 1 3 Tab file format 1 2 Tabs use of 44 Tag based expression profiling 101 TaqMan primers 164 tar file format 1 3 Tar file format 1 3 Taxonomy batch edit
97. ay of sequence info for the HUMHBB DNA sequence from the Example data All the lines in the view are headings and the corresponding text can be shown by clicking the text e Name The name of the sequence which is also shown in sequence views and in the Navigation Area Description A description of the sequence e Comments The author s comments about the sequence Keywords Keywords describing the sequence CHAPTER 9 VIEWING AND EDITING SEQUENCES 104 e Db source Accession numbers in other databases concerning the same sequence e Gb Division Abbreviation of GenBank divisions See section 3 3 in the GenBank release notes for a full list of GenBank divisions e Length The length of the sequence e Modification date Modification date from the database This means that this date does not reflect your own changes to the sequence See the history section 7 for information about the latest changes to the sequence after it was downloaded from the database e Organism Scientific name of the organism first line and taxonomic classification levels second and subsequent lines The information available depends on the origin of the sequence Sequences downloaded from database like NCBI and UniProt see section 10 have this information On the other hand some sequence formats like fasta format do not contain this information Some of the information can be edited by clicking the blue Edit text This means that you can add yo
98. be ER ES More data Sequences README O More data ER Recycle bin 0 O Sequences Figure 3 3 In this example the location called CLC_Data points to the folder at C Documents and settings iclcusenCLC Data Opening data The elements in the Navigation Area are opened by Double click the element or Click the element Show 2 in the Toolbar Select the desired way to view the element This will open a view in the View Area which is described in section 3 2 Adding data Data can be added to the Navigation Area in a number of ways Files can be imported from the file system see chapter 6 Furthermore an element can be added by dragging it into the Navigation Area This could be views that are open elements on lists e g search hits or sequence lists and files located on your computer If a file or another element is dropped on a folder it is placed at the bottom of the folder If it is dropped on another element it will be placed just below that element If the element already exists in the Navigation Area you will be asked whether you wish to create a copy CHAPTER 3 USER INTERFACE 39 3 1 2 Create new folders In order to organize your files they can be placed in folders Creating a new folder can be done in two ways right click an element in the Navigation Area New Folder H or File New Folder If a folder is selected in the Navigation Area when adding a new folder the
99. ce using the Toolbox tools Usually you use the 1 reading frame which means that the translation starts from the first nucleotide Stop codons result in an asterisk being inserted in the protein sequence at the corresponding position It is possible to translate in any combination of the six reading frames in one analysis To translate select a nucleotide sequence Toolbox in the Menu Bar Nucleotide Analysis A Translate to Protein 3 or right click a nucleotide sequence Toolbox Nucleotide Analysis Translate to Protein 4 This opens the dialog displayed in figure 12 4 A G Translate to Protein Es 1 Select nucleotide REIS Sissi asas sequences Projects Selected Elements 1 Er y OLC_ Data XX ATPSal mRNA Example Data Xc ATP8al genomic sec xx Cloning Primers Protein analyses 55 Protein orthologs RNA secondary strud Sequencing data Figure 12 4 Choosing sequences for translation If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Clicking Next generates the dialog seen in figure 12 5 Here you have the following options Reading frames If you wish to translate the whole sequence you must specify the reading frame for the translation If you select e g two reading frames two pr
100. ces but the differences will be explained for each group of settings Note When you make changes to the settings in the Side Panel they are not automatically saved when you save the sequence Click Save restore Settings to save the settings see section 4 5 for more information Sequence Layout These preferences determine the overall layout of the sequence e Spacing Inserts a space at a specified interval No spacing The sequence is shown with no spaces Every 10 residues There is a space every 10 residues starting from the beginning of the sequence Every 3 residues frame 1 There is a space every 3 residues corresponding to the reading frame starting at the first residue Every 3 residues frame 2 There is a space every 3 residues corresponding to the reading frame starting at the second residue Every 3 residues frame 3 There is a space every 3 residues corresponding to the reading frame starting at the third residue CHAPTER 9 VIEWING AND EDITING SEQUENCES 92 Wrap sequences Shows the sequence on more than one line No wrap The sequence is displayed on one line Auto wrap Wraps the sequence to fit the width of the view not matter if it is zoomed in our out displays minimum 10 nucleotides on each line Fixed wrap Makes it possible to specify when the sequence should be wrapped In the text field below you can choose the number of residues to display on each line
101. chapter also explains how to create a new sequence and how to gather several sequences in a sequence list 9 1 View sequence When you double click a sequence in the Navigation Area the sequence will open automatically and you will see the nucleotides or amino acids The zoom options described in section 3 3 allow you to e g zoom out in order to see more of the sequence in one view There are a number of options for viewing and editing the sequence which are all described in this section All the options described in this section also apply to alignments further described in section 14 2 90 CHAPTER 9 VIEWING AND EDITING SEQUENCES 91 9 1 1 Sequence settings in Side Panel Each view of a sequence has a Side Panel located at the right side of the view see figure 9 1 A Oo Fit Width 100 Pan CSS Zoom In Zoom Cut k Sequence layout k Annotation layout H Annotation types k Restriction sites Residue coloring Nucleotide info Find k Text Format Figure 9 1 Overview of the Side Panel which is always shown to the right of a view When you make changes in the Side Panel the view of the sequence is instantly updated To show or hide the Side Panel select the View Ctrl U or Click the 3 at the top right corner of the Side Panel to hide Click the gray Side Panel button to the right to show Below each group of settings will be explained Some of the preferences are not the same for nucleotide and protein sequen
102. cking Next will display the dialog shown in figure 6 12 r q Export Graphics 7 1 Output options 2 Save in file 3 Export size Choose resolution O Screen resolution 530x3072 pixels 9 MB memory usage Low resolution 286x1660 pixels 2 MB memory usage Medium resolution 1145x6640 pixels 43 MB memory usage High resolution 4582x26561 pixels 696 MB memory usage Previous Next Y Erin XX Cancel Figure 6 12 Parameters for bitmap formats size of the graphics file You can adjust the size the resolution of the file to four standard sizes e Screen resolution e Low resolution e Medium resolution High resolution The actual size in pixels is displayed in parentheses An estimate of the memory usage for exporting the file is also shown If the image is to be used on computer screens only a low resolution is sufficient If the image is going to be used on printed material a higher resolution is necessary to produce a good result CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 81 Parameters for vector formats For pdf format clicking Next will display the dialog shown in figure 6 13 this is only the case if the graphics is using more than one page F G Export Graphics ES 1 Output options B 2 Save in file 3 Page setup Page setup parameters Orientation Portrait Paper Size A4 Horizontal Pagecount Not Applicable Vertical Pagecount Not Applicable Hea
103. cognition sequence or a list of commercial vendors 13 2 2 Number of cut sites Clicking Next confirms the list of enzymes which will be included in the analysis and takes you to the dialog shown in figure 13 12 If you wish the output of the restriction map analysis only to include restriction enzymes which cut the sequence a specific number of times use the checkboxes in this dialog No restriction site 0 One restriction site 1 Two restriction sites 2 Three restriction site 3 N restriction sites You can customize the enzyme database for your installation see section CHAPTER 13 RESTRICTION SITE ANALYSES 138 q Restriction Site Analysis 1 Select DNA RNA sequence s 2 Enzymes to be considered in calculation 3 Number of cut sites Display enzymes with No restriction site 0 4 One restriction site 1 Two restriction sites 2 Three restriction sites 3 E N restriction sites Minimum 1 Maximum T Any number of restriction sites gt 0 EDES Cermo due ln eme Figure 13 12 Selecting number of cut sites Minimum Maximum e Any number of restriction sites gt O The default setting is to include the enzymes which cut the sequence one or two times You can use the checkboxes to perform very specific searches for restriction sites e g if you wish to find enzymes which do not cut the sequence or enzymes cutting exa
104. croll wheel to CHAPTER 3 USER INTERFACE 91 ZOOM If you want to get a quick overview of a sequence or a tree use the Fit Width function instead of the Zoom Out function If you press Shift while clicking in a View the zoom function is reversed Hence clicking on a sequence in this way while the Zoom Out mode toolbar item is selected zooms in instead of zooming out 3 3 3 Fit Width The Fit Width function adjusts the content of the View so that both ends of the sequence alignment or tree is visible in the View in question This function does not change the mode of the mouse pointer 3 3 4 Zoom to 100 The Zoom to 100 function zooms the content of the View so that it is displayed with the highest degree of detail This function does not change the mode of the mouse pointer 3 3 5 Move The Move mode allows you to drag the content of a View E g if you are studying a sequence you can click anywhere in the sequence and hold the mouse button By moving the mouse you move the sequence in the View 3 3 6 Selection The Selection mode Ch is used for selecting in a View selecting a part of a sequence selecting nodes in a tree etc It is also used for moving e g branches in a tree or sequences in an alignment When you make a selection on a sequence or in an alignment the location is shown in the bottom right corner of the screen E g 23 24 means that the selection is between two residues 23
105. ctly twice 13 2 3 Output of restriction map analysis Clicking next shows the dialog in figure 13 13 a GS Restriction Site Analysis 1 Select DNA RNA sequence s 2 Enzymes to be considered in calculation 3 Number of cut sites eg utput options 4 Result handling gis 4 Add restriction sites as annotations to sequence s Y Create restriction map Create list of cutting enzymes Result handling o Open 5 Save Log handling Make log Figure 13 13 Choosing to add restriction sites as annotations or creating a restriction map This dialog lets you specify how the result of the restriction map analysis should be presented e Add restriction sites as annotations to sequence s This option makes it possible to see the restriction sites on the sequence see figure 13 14 and save the annotations for later use CHAPTER 13 RESTRICTION SITE ANALYSES 139 e Create restriction map The restriction map is a table of restriction sites as shown in figure 13 15 If more than one sequence were selected the table will include the restriction sites of all the sequences This makes it easy to compare the result of the restriction map analysis for two sequences or more The following sections will describe these output formats in more detail In order to complete the analysis click Finish see section 8 1 for information about the Save and Open options 13 2 4 Restriction sites as annotation on the seq
106. cture FS Sequencing data E 4 HE Q zenter search term gt A s Previous Finish x Cancel Figure 12 6 Create Reading Frame dialog r q Find Open Reading Frames amp 3 1 Select nucleotide les aramet sequences 2 Set parameters Start Codon AUG Any All start codons in genetic code 5 Other AUG CUG UUG 4 Both strands Open ended sequence Genetic code 1 Standard v Minimum length codons 100 3 V Include stop codon in result WE Gema a ea Figure 12 7 Create Reading Frame dialog AUG Most commonly used start codon Any Find all open reading frames All start codons in genetic code Other Here you can specify a number of start codons separated by commas e Both strands Finds reading frames on both strands e Open ended Sequence Allows the ORF to start or end outside the sequence If the sequence studied is a part of a larger sequence it may be advantageous to allow the ORF to start or end outside the sequence e Genetic code translation table e Include stop codon in result The ORFs will be shown as annotations which can include the stop codon if this option is checked The translation tables are occasionally updated from NCBI The tables are not available in this printable version of the user manual Instead the tables are included in the Help menu in the Menu Bar in the appendix CHAPTER 12 NUCLEOTIDE ANALYSES 130
107. d in chapter 9 1 1 When you have adjusted a view of e g a sequence your settings in the Side Panel can be saved When you open other sequences which you want to display in a similar way the saved settings can be applied The options for saving and applying are available in the top of the Side Panel see figure 4 10 To save and apply the saved settings click 35 seen in figure 4 10 This opens a menu where the following options are available CHAPTER 4 USER PREFERENCES AND SETTINGS 63 k Sequence layout k Annotation layout F Annotation types F Restriction sites Residue coloring Find k Text Format Figure 4 8 The Side Panel of a sequence contains several groups Sequence layout Annotation types Annotation layout etc Several of these groups are present in more views E g Sequence layout is also in the Side Panel of alignment views E ad LL I ha 4S Sequence layout Spaces every 10 residues No wrap Auto wrap O Fixed wrap ED Double stranded Numbers on sequences Relative to 1 Numbers on plus strand Follow selection Lock labels Sequence label Name v gt Annotation layout Annotation types gt Restriction sites gt Residue coloring Find Text Format Figure 4 9 The Sequence layout is expanded Figure 4 10 At the top of the Side Panel you can Expand all groups Collapse all preferenc
108. der Text Footer Text Show Pagenumber Yes P Page Setup Cenes ETA Figure 6 13 Page setup parameters for vector formats The settings for the page setup are shown and clicking the Page Setup button will display a dialog where these settings can ba adjusted This dialog is described in section 5 2 The page setup is only available if you have selected to export the whole view if you have chosen to export the visible area only the graphics file will be on one page with no headers or footers 6 3 4 Exporting protein reports It is possible to export a protein report using the normal Export function ES which will generate a pdf file with a table of contents Click the report in the Navigation Area Export E in the Toolbar select pdf You can also choose to export a protein report using the Export graphics function l l but in this way you will not get the table of contents 6 4 Export graph data points to a file Data points for graphs displayed along the sequence or along an alignment mapping or BLAST result can be exported to a semicolon separated text file csv format An example of such a graph is shown in figure 6 14 This graph shows the coverage of reads of a read mapping produced with CLC Genomics Workbench To export the data points for the graph right click the graph and choose Export Graph to Comma separated File Depending on what kind of graph you have selected different op
109. e Region 10 A region on negative strand that covers ranges from 210 to 220 inclusive Region 11 A region on negative strand that covers ranges from 230 to 240 inclusive and 250 to 260 inclusive bl et bl et 4000 pBR322 1000 4361 bp S protein Figure 9 4 A molecule shown in a circular view e Differences In the Sequence Layout preferences only the following options are available in the circular view Numbers on plus strand Numbers on sequence and Sequence label You cannot zoom in to see the residues in the circular molecule If you wish to see these details split the view with a linear view of the sequence In the Annotation Layout you also have the option of showing the labels as Stacked This means that there are no overlapping labels and that all labels of both annotations and restriction sites are adjusted along the left and right edges of the view CHAPTER 9 VIEWING AND EDITING SEQUENCES 98 9 2 1 Using split views to see details of the circular molecule In order to see the nucleotides of a circular molecule you can open a new view displaying a circular view of the molecule Press and hold the Ctrl button 36 on Mac click Show Sequence at the bottom of the view This will open a linear view of the sequence below the circular view When you zoom in on the linear view you can see the residues as shown in figure 9 5 O pBR322 gt bla bl pBR322 100
110. e one or more sequences CHAPTER 9 VIEWING AND EDITING SEQUENCES 108 into the Navigation Area This allows you to extract specific sequences from the entire list Another option is to extract all sequences found in the list This can also be done for Alignments EE Contigs and read mappings e Read mapping tables BLAST result e BLAST overview tables RNA Seg samples 22 and of course sequence lists For mappings and BLAST results the main sequences i e reference consensus and query sequence will not be extracted To extract the sequences Toolbox General Sequence Analysis a Extract Sequences E This will allow you to select the elements that you want to extract sequences from see the list above Clicking Next displays the dialog shown in 9 14 Y Extract Sequences 1 Please select a Mes sequencelist 2 Select destination Destination O Extract to single sequences Extract to new sequence list Number of sequences 12 sequences or paired end pairs found Figure 9 14 Choosing whether the extracted sequences should be placed in a new list or as single sequences Here you can choose whether the extracted sequences should be placed in a new list or extracted as single sequences For sequence lists only the last option makes sense but for alignments mappings and BLAST results it would make sense to place the sequences in a list CHAPTER 9 VIEWING AND EDITING SEQUEN
111. e View or double click title of view 3 2 7 Side Panel The Side Panel allows you to change the way the contents of a view are displayed The options in the Side Panel depend on the kind of data in the view and they are described in the relevant sections about sequences alignments trees etc Side Panel are activated in this way select the view Ctrl U 38 U on Mac or right click the tab of the view View Show Hide Side Panel 1 Note Changes made to the Side Panel will not be saved when you save the view See how to save the changes in the Side Panel in chapter 4 The Side Panel consists of a number of groups of preferences depending on the kind of data being viewed which can be expanded and collapsed by clicking the header of the group You can also expand or collapse all the groups by clicking the icons at the top CHAPTER 3 USER INTERFACE 50 3 3 Zoom and selection in View Area The mode toolbar items in the right side of the Toolbar apply to the function of the mouse pointer When e g Zoom Out is selected you zoom out each time you click in a view where zooming is relevant texts tables and lists cannot be zoomed The chosen mode is active until another mode toolbar item is selected Fit Width and Zoom to 100 do not apply to the mouse pointer HENO Ebo Ale ay he Fit Width 10096 Pan E Zoor In Zoom Out Figure 3 14 The mode toolbar items 3 3 1 Zoom in There are four ways of Zoom
112. e no text next to the sequence icon Rename element Renaming a folder or an element in the Navigation Area can be done in three different ways select the element Edit in the Menu Bar Rename or select the element F2 click the element once wait one second click the element again CHAPTER 3 USER INTERFACE 42 When you can rename the element you can see that the text is selected and you can move the cursor back and forth in the text When the editing of the name has finished press Enter or select another element in the Navigation Area If you want to discard the changes instead press the Esc key 3 1 7 Delete elements Deleting a folder or an element can be done in two ways right click the element Delete 4 or select the element press Delete key This will cause the element to be moved to the Recycle Bin ff where it is kept until the recycle bin is emptied This means that you can recover deleted elements later on For deleting annotations instead of folders or elements see section 9 3 2 Restore Deleted Elements The elements in the Recycle Bin ff can be restored by dragging the elements with the mouse into the folder where they used to be If you have deleted large amounts of data taking up very much disk space you can free this disk space by emptying the Recycle Bin ff Edit in the Menu Bar Empty Recycle Bin SE Note This cannot be undone and you will therefore not be able to recover the data prese
113. e settings is the default setting for each view is exported If you wish to export the Side Panel Settings themselves see section 4 2 2 The process of importing preferences is similar to exporting Press Ctrl K 38 on Mac to open Preferences Import Browse to and select the cpf file Import and apply preferences 4 4 1 The different options for export and importing To avoid confusion of the different import and export options here is an overview e Import and export of bioinformatics data such as sequences alignments etc described in section 6 1 e Graphics export of the views which creates image files in various formats described in section 6 3 e Import and export of Side Panel Settings as described in the next section e Import and export of all the Preferences except the Side Panel settings This is described above 4 5 View settings for the Side Panel The Side Panel is shown to the right of all views that are opened in CLC Sequence Viewer By using the settings in the Side Panel you can specify how the layout and contents of the view Figure 4 8 is an example of the Side Panel of a sequence view By clicking the black triangles or the corresponding headings the groups can be expanded or collapsed An example is shown in figure 4 9 where the Sequence layout is expanded The content of the groups is described in the sections where the functionality is explained E g Sequence Layout for sequences is describe
114. e the enzyme database for your installation see section CHAPTER 13 RESTRICTION SITE ANALYSES 141 Below there are two panels e Tothe left you see all the enzymes that are in the list select above If you have not chosen to use an existing enzyme list this panel shows all the enzymes available e To the right there is a list of the enzymes that will be used Select enzymes in the left side panel and add them to the right panel by double clicking or clicking the Add button E If you e g wish to use EcoRV and BamHI select these two enzymes and add them to the right side panel If you wish to use all the enzymes in the list Click in the panel to the left press Ctrl A 38 A on Mac Add gt The enzymes can be sorted by clicking the column headings i e Name Overhang Methylation or Popularity This is particularly useful if you wish to use enzymes which produce e g a 3 overhang In this case you can sort the list by clicking the Overhang column heading and all the enzymes producing 3 overhangs will be listed together for easy selection When looking for a specific enzyme it is easier to use the Filter If you wish to find e g Hindlll sites simply type Hindlll into the filter and the list of enzymes will shrink automatically to only include the Hindlll enzyme This can also be used to only show enzymes producing e g a 3 overhang as shown in figure 13 17 Restriction Site Analysis 1 Select DNA RNA
115. eadings of the tables change depending on whether you calculate individual or comparative sequence statistics The output of comparative protein sequence statistics include e Sequence information Sequence type Length Organism Name Description Modification Date Weight This is calculated like this swrmunitsinsequence weight unit links x weight H20 where links is the sequence length minus one and units are amino acids The atomic composition is defined the same way Isoelectric point Aliphatic index e Sequence Information CHAPTER 11 GENERAL SEQUENCE ANALYSES 118 Sequence type Length Organism Name Description Modification Date Weight Isoelectric point Aliphatic index e Amino acid distribution e Annotation table The output of nucleotide sequence statistics include e General statistics Sequence type Length Organism Name Description Modification Date Weight calculated as single stranded DNA e Nucleotide distribution table e Annotation table 11 2 1 Bioinformatics explained Protein statistics Every protein holds specific and individual features which are unique to that particular protein Features such as isoelectric point or amino acid composition can reveal important information of a novel protein Many of the features described below are calculated in a simple way Molecular weight
116. ect DNA seo UENCES Projects Selected Elements 1 J CLC Data xx ATP8al mRNA gt Example Data Xx ATP8al genomic s AL Cloning Primers Protein analyses 3 Protein orthologs RNA secondary st Sequencing data gt Q nter search term gt 4 poros ops Kena Figure 12 1 Translating DNA to RNA 12 2 Convert RNA to DNA CLC Sequence Viewer lets you convert an RNA sequence into DNA substituting the U residues Urasil for T residues Thymine select an RNA sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analysis Convert RNA to DNA 3 or right click a sequence in Navigation Area Toolbox Nucleotide Analysis A Convert RNA to DNA 3 This opens the dialog displayed in figure 12 2 ra E Convert RNA to DNA EJ 1 Select RNA sequences SelecP RNA sec puentes Projects Selected Elements 1 CLC Data xx ATP8al mRNA 3 UTR large gt Example Data Xc ATP8al genomic s XxX ATP8al mRNA Cloning Primers Protein analyses 4 7 Protein orthologs RNA secondary st 5200 20 ATP8al mRNA 5 Sequencing data EA 4 an p Qe zenter search term gt 4 aie EN gt Figure 12 2 Translating RNA to DNA If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence li
117. ecular cloning Viewer Advanced molecular cloning Graphical display of in silico cloning Advanced sequence manipulation Database searches Viewer GenBank Entrez searches E UniProt searches Swiss Prot TrEMBL Web based sequence search using BLAST BLAST on local database Creation of local BLAST database PubMed lookup Web based lookup of sequence data Search for structures at NCBI Main Main Main Main Genomics E Genomics Ej Genomics LI E E Genomics E 162 APPENDIX A MORE FEATURES General sequence analyses Viewer Linear sequence view y Circular sequence view Text based sequence view Editing sequences Adding and editing sequence annotations Advanced annotation table Join multiple sequences into one E Sequence statistics y Shuffle sequence E Local complexity region analyses Advanced protein statistics Comprehensive protein characteristics report Nucleotide analyses Viewer Basic gene finding E Reverse complement without loss of annota tion Restriction site analysis Advanced interactive restriction site analysis Translation of sequences from DNA to pro E teins Interactive translations of sequences and alignments G C content analyses and graphs Protein analyses Viewer 3D molecule view Hydrophobicity analyses Antigenicity analysis Protein charge analysis Reverse translation from protein to DNA Proteolytic cleavage detection Prediction of signal peptides SignalP
118. ee section 4 5 for more about floating side panels The New view setting allows you to choose whether the View preferences are to be shown automatically when opening a new view If this option is not chosen you can press Ctrl U 36 U on Mac to see the preferences panels of an open view The View Format allows you to change the way the elements appear in the Navigation Area The following text can be used to describe the element e Name this is the default information to be shown e Accession sequences downloaded from databases like GenBank have an accession number e Latin name e Latin name accession e Common name e Common name accession The User Defined View Settings gives you an overview of the different Side Panel settings that are saved for each view See section 4 5 for more about how to create and save style sheets lf there are other settings beside CLC Standard Settings you can use this overview to choose which of the settings should be used per default when you open a view see an example in figure 4 4 In this example the CLC Standard Settings is chosen as default 4 2 1 Number formatting in tables In the preferences you can specify how the numbers should be formatted in tables see figure 4 5 CHAPTER 4 USER PREFERENCES AND SETTINGS 60 E EB Preferences xX _ Use r DENMEO view Seccings Available Editors amp 3D Molecule General HEE Alignment dl BLAST Graphics ES
119. ents Whereas the optimal solution to the pairwise alignment problem can be found in reasonable time the problem of constructing a multiple alignment is much harder The first major challenge in the multiple alignment procedure is how to rank different alignments i e which scoring function to use Since the sequences have a shared history they are correlated through their phylogeny and the scoring function should ideally take this into account Doing so CHAPTER 14 SEQUENCE ALIGNMENT 151 20 40 n 80 kvlgafsdglah l Q6WN27 muhltgeekaBvtalwokvnva ENGUEENGENNANGASARGANNANEcREGANAs 5 E ATT Q6WN20 muhltosekaavtalwokvnvxevagealoriEssaivvvopwtarffesfadisspdavmsnxkvkahgkkvlgafsdalah Q6WN29 myhltodekaguta HER HEHE EH OOOO t HA tt afisdglah Q6WN25 muhItgeekaavtalwgkvnvdevggea logros Ivvypwtarffosftadistodavmsnpkvkahokkvigafsdglan Q6WN22 MWh tgeeksavttIwokvnvdevogea lor iS 1vvypwtarffestodisspdavmonpkikahokkvigafsdalan P68225 MWh EpeBknawkt EHEHE EEE TREE H h P68053 yhltgegkaavtalwokvunvdevagealori EsSivvypwtarffosfodisspdavmonpkvkahokkvInsfseglkn P68046 ENhiEEAa DO RO EO ee O RO N sfsdglkn P68231 muhISgdeknavhalwskvkva AA ETT HM P68228 mynisgdeknavholwskvkvdevagealori EsSivvypwtrrffesfodistadavmnnpkvkahoskvInsfgdglsh NP 058652 myhltdasksavscimakynpdevogealoriEnasivvypwtaryfosfodissasaimonpkvkahgkkvitafneglknl NP 032246 HEBEL BEET EERE EEE HES eH h Q6H1U7 muhitaceknaitsIwgkvaieatogea lor if FR 1ivypwtsrifohtadisnakavmsnpkviahgakvlvaf
120. equence can be moved The sequences can also be sorted automatically to let you save time moving the sequences around To sort the sequences alphabetically Right click the name of a sequence Sort Sequences Alphabetically If you change the Sequence name in the Sequence Layout view preferences you will have to ask the program to sort the sequences again 14 3 5 Delete and rename sequences Sequences can be removed from the alignment by right clicking the label of a sequence right click label Delete Sequence This can be undone by clicking Undo in the Toolbar CHAPTER 14 SEQUENCE ALIGNMENT 150 If you wish to delete several Sequences you can check all the sequences right click and choose Delete Marked Sequences To show the checkboxes you first have to click the Show Selection Boxes in the Side Panel A sequence can also be renamed right click label Rename Sequence This will show a dialog letting you rename the sequence This will not affect the sequence that the alignment is based on 14 4 Bioinformatics explained Multiple alignments Multiple alignments are at the core of bioinformatical analysis Often the first step in a chain of bioinformatical analyses is to construct a multiple alignment of a number of homologs DNA or protein sequences However despite their frequent use the development of multiple alignment algorithms remains one of the algorithmically most challenging areas in bioinformatical research
121. es Dock Undock preferences Help and Save Restore preferences e Save Settings This brings up a dialog as shown in figure 4 11 where you can enter a name for your settings Furthermore by clicking the checkbox Always apply these settings you can choose to use these settings every time you open a new view of this type If you wish to change which settings should be used per default open the Preferences dialog see section 4 2 CHAPTER 4 USER PREFERENCES AND SETTINGS 64 e Delete Settings Opens a dialog to select which of the saved settings to delete e Apply Saved Settings This is a submenu containing the settings that you have previously saved By clicking one of the settings they will be applied to the current view You will also see a number of pre defined view settings in this submenu They are meant to be examples of how to use the Side Panel and provide quick ways of adjusting the view to common usages At the bottom of the list of settings you will see CLC Standard Settings which represent the way the program was set up when you first launched it e Save Settings Please enter a name for these user settings RT AE a ERA My settings ly lv Always apply these settings X Cancel save Figure 4 11 The save settings dialog Save Settings Sequence layout Delete Settings k Annotation layout Apply Saved Settings Compact k Annotation types Non compact no wrap aa Non compact with translations
122. ese changes will be saved when you Save the graph whereas the changes in the Side Panel need to be saved explicitly see section 4 5 For more information about the graph view please see section B Appendix C Working with tables Tables are used in a lot of places in the CLC Sequence Viewer The contents of the tables are of course different depending on the context but there are some general features for all tables that will be explained in the following Figure C 1 shows an example of a typical table This is the table result of Find Open Reading Frames xx We will use this table as an example in the following to illustrate the concepts that are relevant for all kinds of tables Find reading Rows 169 Find reading Frame output Filter Do a n ia Settings mid Column width Found at strand Start codon positive ACT negative MEIN Show column negative TT positive Tac aah positive ACC End negative TAT Length negative AT E E CAC Found at strand positive AGG Start codon positive Baia eo postive TTG negative AG Deselect All negative ETE positive AG negative GT Figure C 1 A table showing open reading frames First of all the columns of the table are listed in the Side Panel to the right of the table By clicking the checkboxes you can hide show the columns in the table Furthermore you can sort the table by clicking on the column headers Pressing Ctrl on Mac while you clic
123. esidue coloring These preferences make it possible to color both the residue letter and set a background color for the residue e Non standard residues For nucleotide sequences this will color the residues that are not C G A T or U For amino acids only B Z and X are colored as non standard residues Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color e Rasmol colors Colors the residues according to the Rasmol color scheme See http www openrasmol org doc rasmol html Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color e Polarity colors only protein Colors the residues according to the polarity of amino acids Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color e Trace colors only DNA Colors the residues according to the color conventions of chromatogram traces A green C blue G black and T red Foreground color Sets the color of the letter Background color Sets the background color of the residues Find The Find function can also be invoked by pressing Ctrl Shift F 38 Shi
124. ettings button i gt again and select Apply Saved Settings you will see My settings in the menu together with some pre defined settings that the CLC Sequence Viewer has created for you see figure 2 10 il Save Settings Delete Settings Apply Saved Settings gt Black white Conservation color Morn compack Show annotations my settings CLC Standard Settings Figure 2 10 Menu for applying saved settings Whenever you open an alignment you will be able to apply these settings Each kind of view has its own list of settings that can be applied At the bottom of the list you will see the CLC Standard Settings which are the default settings for the view CHAPTER 2 TUTORIALS 2 4 Tutorial GenBank search and download The CLC Sequence Viewer allows you to search the NCBI GenBank database directly from the program giving you the opportunity to both open view analyze and save the search results without using any other applications To conduct a search in NCBI GenBank from CLC Sequence Viewer you must be connected to the Internet This tutorial shows how to find a complete human hemoglobin DNA sequence in a situation where you do not know the accession number of the sequence To start the search Download Search for Sequences at NCBI g This opens the search view We are searching for a DNA sequence hence Nucleotide Now we are going to adjust parameters for the search Click Add search para
125. etween pairs of sequences in each cluster The algorithm assumes that the distance data has the so called molecular clock property i e the divergence of sequences occur at the same constant rate at all parts of the tree This means that the leaves of UPGMA trees all line up at the extant sequences and that a root is estimated as part of the procedure Neighbor Joining The neighbor joining algorithm Saitou and Nei 1987 on the other hand builds a tree where the evolutionary rates are free to differ in different lineages i e the tree does not have a particular root Some programs always draw trees with roots for practical reasons but for neighbor joining trees no particular biological hypothesis is postulated by the placement of the root The method works very much like UPGMA The main difference is that instead of using pairwise distance this method subtracts the distance to all other nodes from the pairwise distance This is done to take care of situations where the two closest nodes are not neighbors in the real tree The neighbor join algorithm is generally considered to be fairly good and is widely used Algorithms that improves its cubic time performance exist The improvement is only significant for quite large datasets Character based methods Whereas the distance based methods compress all sequence information into a single number the character based methods attempt to infer the phylogeny CHAPTER 15 PHYLOGENETIC TREES 158 A
126. ew see section 9 3 1 In order to completely remove the annotation right click the annotation Delete Delete Annotation x If you want to remove all annotations of one type right click an annotation of the type you want to remove Delete Delete Annota tions of Type type CHAPTER 9 VIEWING AND EDITING SEQUENCES 103 If you want to remove all annotations from a sequence right click an annotation Delete Delete All Annotations The removal of annotations can be undone using Ctrl Z or Undo in the Toolbar If you have more sequences e g in a sequence list alignment or contig you have two additional options right click an annotation Delete Delete All Annotations from All Sequences right click an annotation Delete Delete Annotations of Type type from All Sequences 9 4 Element information The normal view of a sequence by double clicking shows the annotations as boxes along the sequence but often there is more information available about sequences This information is available through the Element info view To view the sequence information select a sequence in the Navigation Area Show in the Toolbar Element info 15 This will display a view similar to fig 9 10 Name Edit Description Edit Comments Edit KeyWords Edit Db Source Gb Division Length Modification Date Latin name Edit Common name Edit Taxonomy name Edit Figure 9 10 The initial displ
127. expect few but large gaps the Gap open cost should be set significantly higher than the Gap extension cost CHAPTER 14 SEQUENCE ALIGNMENT 145 However for most alignments it is a good idea to make the Gap open cost quite a bit higher than the Gap extension cost The default values are 10 0 and 1 0 for the two parameters respectively e End gap cost The price of gaps at the beginning or the end of the alignment One of the advantages of the CLC Sequence Viewer alignment method is that it provides flexibility in the treatment of gaps at the ends of the sequences There are three possibilities Free end gaps Any number of gaps can be inserted in the ends of the sequences without any cost Cheap end gaps All end gaps are treated as gap extensions and any gaps past 10 are free End gaps as any other Gaps at the ends of sequences are treated like gaps in any other place in the sequences When aligning a long sequence with a short partial sequence it is ideal to use free end gaps since this will be the best approximation to the situation The many gaps inserted at the ends are not due to evolutionary events but rather to partial data Many homologous proteins have quite different ends often with large insertions or deletions This confuses alignment algorithms but using the Cheap end gaps option large gaps will generally be tolerated at the sequence ends improving the overall alignment This is the default setting of the
128. ext format for all of the nodes the tree contains Text size The size of the text representing the nodes can be modified in tiny small medium large or huge Font Sets the font of the text of all nodes Bold Sets the text bold if enabled e Tree Layout Different layouts for the tree Node symbol Changes the symbol of nodes into box dot circle or none if you don t want a node symbol Layout Displays the tree layout as standard or topology Show internal node labels This allows you to see labels for the internal nodes Initially there are no labels but right clicking a node allows you to type a label Label color Changes the color of the labels on the tree nodes Branch label color Modifies the color of the labels on the branches Node color Sets the color of all nodes Line color Alters the color of all lines in the tree e Annotation Layout Specifies the annotation in the tree Nodes Sets the annotation of all nodes either to name or to species Branches Changes the annotation of the branches to bootstrap length or none if you don t want annotation on branches CHAPTER 15 PHYLOGENETIC TREES 155 Note Dragging in a tree will change it You are therefore asked if you want to save this tree when the Tree Viewer is closed You may select part of a Tree by clicking on the nodes that you want to select Right click a selected node opens a menu with the following options
129. f four parts e The first part includes the introduction and some tutorials showing how to apply the most significant functionalities of CLC Sequence Viewer e The second part describes in detail how to operate all the program s basic functionalities e The third part digs deeper into some of the bioinformatic features of the program In this part you will also find our Bioinformatics explained sections These sections elaborate on the algorithms and analyses of CLC Sequence Viewer and provide more general knowledge of bioinformatic concepts e The fourth part is the Appendix and Index Each chapter includes a short table of contents CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 20 1 8 1 Text formats In order to produce a clearly laid out content in this manual different formats are applied e A feature in the program is in bold starting with capital letters Example Navigation Area e An explanation of how a particular function is activated is illustrated by and bold E g select the element Edit Rename Chapter 2 Tutorials Contents 2 1 Tutorial Getting started lt 2 21 2 1 1 GIRA aa OE score 22 el R MEO CO Gb a eh ee es Pee Eee eee eee ee ae ee E 23 2 2 Tutorial View sequence 2 20s eee eee ee a 23 2 3 Tutorial Side Panel Settings 00 08 ee eee ee es 24 2 3 1 Saving the settings in the Side Panel 25 2 3 2 Applying Saved sett
130. file format 1 3 Gzip file format 1 3 Half life 119 Handling of results 86 Header 69 Heat map 162 Help 14 Hide show Toolbox 52 High throughput sequencing 161 History 84 export 6 preserve when exporting 85 source elements 85 Hydrophobicity 163 Illumina Genome Analyzer 161 Import bioinformatic data 71 72 existing data 23 FASTA data 23 from a web page 2 list of formats 171 preferences 61 INDEX raw sequence 2 Side Panel Settings 60 using copy paste 2 Infer Phylogenetic Tree 152 Insert gaps 148 Installation 9 Isoelectric point 119 IUPAC codes nucleotides 1 Join sequences 122 Jpg format export 79 Keywords 103 Label of sequence 91 Landscape Print orientation OS Lasergene sequence file format 172 Latin name batch edit 42 Length 103 Linux installation 11 installation with RPM package 12 List of restriction enzymes 140 List of sequences 105 Load enzyme list 134 Local complexity plot 162 Locale setting 58 Location of selection on sequence 51 Side Panel 59 Locations multiple 161 Log of batch processing 88 Logo sequence 163 ma4 file format 1 3 Mac OS X installation 10 Manipulate sequences 162 165 Manual editing auditing 58 Manual format 19 Maximize size of view 48 Maximum likelihood 164 Menu Bar illustration 37 MFold 164 184 mm CIF file format 1 3 Mode toolbar 50 Modification date 103 Modify enzyme list 141
131. ft F on Mac The Find function can be used for searching the sequence Clicking the find button will search for the first occurrence of the search term Clicking the find button again will find the next occurrence and so on If the search string is found the corresponding part of the sequence will be selected e Search term Enter the text to search for The search function does not discriminate between lower and upper case characters e Sequence search Search the nucleotides or amino acids For amino acids the single letter abbreviations should be used for searching The sequence search also has a set of advanced search parameters Include negative strand This will search on the negative strand as well Treat ambiguous characters as wildcards in search term If you search for e g ATN you will find both ATG and ATC If you wish to find literally exact matches for ATN i e only find ATN not ATG this option should not be selected CHAPTER 9 VIEWING AND EDITING SEQUENCES 94 Treat ambiguous characters as wildcards in sequence If you search for e g ATG you will find both ATG and ATN If you have large regions of Ns this option should not be selected Note that if you enter a position instead of a sequence it will automatically switch to position search e Annotation search Searches the annotations on the sequence The search is performed both on the labels of the annotations but also on the text appearing in t
132. g with tables C 1 Filtering tables D Formats for import and export D 1 List of bioinformatic data formats 2 000 beeen D 2 List of graphics data formats E IUPAC codes for amino acids F IUPAC codes for nucleotides Bibliography V Index 160 161 166 168 169 171 1 1 1 4 175 177 178 180 Part Introduction Chapter 1 Introduction to CLC Sequence Viewer Contents 1 1 Contact information lt lt 2 9 1 2 Download and installation lt lt 9 1 2 1 POB download ss s s s sos soe ssw A 9 1 2 2 Installation on Microsoft Windows n noaoo 9 1 2 3 Installation on Mac OSX arriscadas das a 10 1 2 4 Installation on Linux with an installer nono aoa o e 11 1 2 5 Installation on Linux with an RPM package 12 1 3 System requirements 2 12 1 4 About CLC Workbenches 0 08 ee ee eee eee 12 1 4 1 New program feature request 0 0 2 0 eee ee 13 LAZ Report program errors et 0 13 1 4 3 CLC Sequence Viewer vs Workbenches 0 4 14 1 5 When the program is installed Getting started ae 14 LO Mass rsrs aa Aa ao 14 1 5 2 Import of example data 15 CATIVA 15 1 6 1 Installing plug iNS 40 oe E
133. gt x NU lt lt SA WCCO VU TZ Ss IUPAC CODES FOR AMINO ACIDS Three letter Description abbreviation Ala Alanine Arg Arginine Asn Asparagine Asp Aspartic acid Cys Cysteine Gin Glutamine Glu Glutamic acid Gly Glycine HIS Histidine Xle Leucine or Isoleucineucine Leu Leucine ILe Isoleucine Lys Lysine Met Methionine Phe Phenylalanine Pro Proline Pyl Pyrrolysine Sec Selenocysteine Ser Serine Thr Threonine Trp Tryptophan Tyr Tyrosine Val Valine ASX Aspartic acid or Asparagine Asparagine Glx Glutamic acid or Glutamine Glutamine Xaa Any amino acid 1 6 Appendix F IUPAC codes for nucleotides Single letter codes based on International Union of Pure and Applied Chemistry The information is gathered from http www iupac org and http www ebi ac uk zen Eucorials da Wee Code Description Adenine Cytosine Guanine Thymine Uracil Purine A or G Pyrimidine C T or U CorA T U or G T U or A CorG C T U or G not A A T U or G not C A T U or C not G A C or G not T not U Any base A C G T or U Zz lt IODOWO0S lt XZS lt VIE SO O gt 177 Bibliography Andrade et al 1998 Andrade M A O Donoghue S l and Rost B 1998 Adaptation of protein surfaces to subcellular location J Mol Biol 276 2 51 7 525b Bachmair et al 1986 Bachmair A Finley D and Varshavsky A 1986 In vivo half life of a protein is a function of
134. h a gb extension This file can be easily imported into the CLC Workbench Import 5 select the file Select You don t have to import one file at a time You can simply select a bunch of files or an entire folder and the CLC Workbench will take care of the rest Even if the files are in different formats You can also simply drag and drop the files into the Navigation Area of the CLC Workbench CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 15 Save As ES save As File Save in DNA ANAs Database As Remote Sources Save jn mM Desktop E Al oR pace E File name Adenoz gb Files format DNA RNA Documents gb OF Cancel Figure 6 6 Saving a sequence as a file in Vector NTI The Vector NTI import is a plug in which is pre installed in the Workbench It can be uninstalled and updated using the plug in manager see section 1 6 6 2 Data export CLC Sequence Viewer can export bioinformatic data in most of the formats that can be imported There are a few exceptions See section 6 1 To export a file select the element to export Export ES choose where to export to select File of type enter name of file Save When exporting to CSV and tab delimited files decimal numbers are formatted according to the Locale setting of the Workbench see section 4 1 If you open the CSV or tab delimited file with spreadsheet software like Excel you should make sure that both the Workbench and
135. h updates you would like to install IF you prefer you can install the updates manually through the plugin and resource manager Additional Alignments Version 1 03 Size 12 5 MB Updated bo At new versions of the CLC Workbenches Figure 1 4 Plug in updates Manage Plug ins and Resources Q 5 gt Manage Plug ins Download Plug ins Manage Resources Download Resources PFAM 100 A Version 1 01 Top 100 occuring protein domains G PF AM 100 Size 5 MB Download and Install E Version 1 0 PFAM 500 D inti Version 1 0 sop Top 500 occuring protein domains PFAM Full Version 1 0 Complete PFAM database Mii Figure 1 5 Resources available for download 1 7 Network configuration If you use a proxy server to access the Internet you must configure CLC Sequence Viewer to use this Otherwise you will not be able to perform any online activities e g searching GenBank CLC Sequence Viewer supports the use of a HTTP proxy and an anonymous SOCKS proxy To configure your proxy settings open CLC Sequence Viewer and go to the Advanced tab of the Preferences dialog figure 1 6 and enter the appropriate information The Preferences dialog is opened from the Edit menu CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 19 Use HTTP Proxy Server O u HTTP Proxy Port f HTTP Proxy Requires Login Passwor d Use SOCKS Proxy Server SOCKS Host Port E ILE You may have to restart t
136. hat exporting an element in CLC format clc will export the history too In this way you can share folders and files with others while preserving the history If an element s history includes source elements i e if there are elements listed in Origins from they must also be exported in order to see the full history Otherwise the history will have entries named Element deleted An easy way to export an element with all its source elements is to use the Export Dependent Elements function described in section 6 2 The history view can be printed To do so click the Print icon E The history can also be exported as a pdf file Select the element in the Navigation Area Export ES in File of type choose History PDF Save Chapter 8 Batching and result handling Contents 8 1 Howto handle results of analyses lt lt lt lt lt 0 lt lt lt lt 10 1 86 o LL Tapio OUIDI S 46445468 ew 6 oe oo ah E a 87 Ce Paes sc idos 88 8 1 How to handle results of analyses This section will explain how results generated from tools in the Toolbox are handled by CLC Sequence Viewer Note that this also applies to tools not running in batch mode see above All the analyses in the Toolbox are performed in a step by step procedure First you select elements for analyses and then there are a number of steps where you can specify parameters some of the analyses have no parameters e g when translating DNA to RNA
137. he application For these changes to take effect Default Data Location Default Data Location CLC Data w m CSI BLAST URL to use when blasting http blast ncbi nlm nih gov Blast cgi Maximum number of simultaneous requests 10 Delay in ms between requests 3000 X Cancel Help Export Import Figure 1 6 Adjusting proxy preferences You have the choice between a HTTP proxy and a SOCKS proxy CLC Sequence Viewer only supports the use of a SOCKS proxy that does not require authorization Exclude hosts can be used if there are some hosts that should be contacted directly and not through the proxy server The value can be a list of hosts each separated by a and in addition a wildcard character can be used for matching For example x foo com localhost If you have any problems with these settings you should contact your systems administrator 1 8 The format of the user manual This user manual offers support to Windows Mac OS X and Linux users The software is very similar on these operating systems In areas where differences exist these will be described separately However the term right click is used throughout the manual but some Mac users may have to use Ctrl click in order to perform a right click if they have a single button mouse The most recent version of the user manuals can be downloaded from http www clcbio com usermanuals The user manual consists o
138. he tooltip that you see when you keep the mouse cursor fixed If the search term is found the part of the sequence corresponding to the matching annotation is selected Below this option you can choose to search for translations as well Sequences annotated with coding regions often have the translation specified which can lead to undesired results e Position search Finds a specific position on the sequence In order to find an interval e g from position 500 to 570 enter 500 570 in the search field This will make a selection from position 500 to 570 both included Notice the two periods between the start an end number see section If you enter positions including thousands separators like 123 345 the comma will just be ignored and it would be equivalent to entering 123345 e Include negative strand When searching the sequence for nucleotides or amino acids you can search on both strands e Name search Searches for sequence names This is useful for searching sequence lists mapping results and BLAST results This concludes the description of the View Preferences Next the options for selecting and editing sequences are described Text format These preferences allow you to adjust the format of all the text in the view both residue letters sequence name and translations if they are shown e Text size Five different sizes e Font Shows a list of Fonts available on your computer e Bold residues Makes the resid
139. hown in figure 2 9 In this way you can save the current state of the settings in the Side Panel so that you can apply them to alignments later on If you check Always apply these settings these settings will be applied every time you open a view of the alignment Type My settings in the dialog and click Save CHAPTER 2 TUTORIALS Zi EF ATP8al orthol xX 260 280 Ang i gs x I Topological domain E1 E2 ATPase Deselect All Transmembrane region Topological domain Residue coloring Alignment info Consensus Y Show Limit Majority w No gaps Ambiguous symbol x v Q29449 FIALLO QIBDVSBTGR YTTLVBELFI LAVAAIEE 11 EBIKEHKABN AVEKKQT Conservation Foreground color Topological domain E1 E2 ATPase Transmembrane region Topological domain w Background color ma 0 100 y Graph Height low v Bar plot x Gap fraction Color different residues Sequence logo Foreground color 4 ELFE ave Settings Delete Settings Apply Saved Settings P Figure 2 8 Saving the settings of the Side Panel Save Settings Please enter a name for these user settings My settings v Always apply these settings Figure 2 9 Dialog for saving the settings of the Side Panel 2 3 2 Applying saved settings When you click the Save Restore S
140. iculatus dee 27 APR 1953 110 FEERHSS SS TTTSTRRRRZTMRTRDRR RRSSARRRRTZZ R DHT RR TSRTRRRZTRRTRR RR U T D RD DADAU RD D A AS ZRZRZRT ZRTRZRZRZZRZTRZZRZJ TZIZIJTYP Z ZPDTZTP HD RPRTTPPZPTPTTTPTTPTTT TITIDITTIIAAOAMHMHMHIHIDIRAIDLA H LU J JL Figure 9 13 A sequence list containing multiple sequences can be viewed in either a table or in a graphical sequence list The graphical view is useful for viewing annotations and the sequence itself while the table view provides other information like sequence lengths and the number of sequences in the list number of Rows reported e Name Accession Description Modification date Length The number of sequences in the list is reported as the number of Rows at the top of the table view Learn more about tables in section C Adding and removing sequences from the list is easy adding is done by dragging the sequence from another list or from the Navigation Area and drop it in the table To delete sequences simply select them and press Delete 4 You can also create a subset of the sequence list select the relevant sequences right click Create New Sequence List This will create a new sequence list which only includes the selected sequences 9 7 3 Extract sequences It is possible to extract individual sequences from a sequence list in two ways If the sequence list is opened in the tabular view It is possible to drag with the mous
141. ide Panel see section 3 2 7 for future use 13 2 Restriction site analysis from the Toolbox Besides the dynamic restriction sites you can do a more elaborate restriction map analysis with more output format using the Toolbox Toolbox Restriction Sites 3 Restriction Site Analysis of This will display the dialog shown in figure 13 9 CHAPTER 13 RESTRICTION SITE ANALYSES 136 q Restriction Site Analysis eS 1 Select DNA RNA O sequence s Projects Selected Elements 1 JEA CLC Data xx ATP8al mRNA gt Example Data XX ATP8al genomic sequence xx Cloning Primers Protein analyses Protein orthologs RNA secondary structure Sequencing data Figure 13 9 Choosing sequence ATP8a1 MRNA for restriction map analysis If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements 13 2 1 Selecting sorting and filtering enzymes Clicking Next lets you define which enzymes to use as basis for finding restriction sites on the sequence At the top you can choose to Use existing enzyme list Clicking this option lets you select an enzyme list which is stored in the Navigation Area See section 13 3 for more about creating and modifying enzyme lists Below there are two panels e To the left you see all the enzymes that are in the li
142. iew Toolbox Workspace Help g Show Ctrl 0 Extract Sequences New Show CF Close Ctrl W 24 Close Tab Area Close All Views Ctrl Shift W Close Other Tabs Save Ctrl S E Save As Ctrl Shift S EE Import Ctrl ES Import VectorNTI Data ES Export Ctrl E Export with Dependent Elements Export Graphics Ctrl 6 Location P Page Setup amp Print Ctrl P Oy Exit Alt F4 Figure 6 3 Import the whole Vector NTI Database This will bring up a dialog letting you choose to import from the default location of the database or you can specify another location If the database is installed in the default folder like e g C VNTI Database press Yes If not click No and specify the database folder manually When the import has finished the data will be listed in the Navigation Area of the Workbench as shown in figure 6 4 If something goes wrong during the import process please report the problem to sup port clcbio com To circumvent the problem see the following section on how to import parts of the database It will take a few more steps but you will most likely be able to import this way CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 4 EE vector NTI Data aa Proteins EE Nucleotide OE ADCY 2 Adenoz 50 ADRALA j 256 BaculoDirect Linear DMA i cem 206 BaculoDirect Linear DNA Clonir dom e BOY i Jaai AE ER OF j an e CDE E CAF Figure 6 4 The Vector NTI Data folder containing all i
143. ifies the color of the graph for line and bar plots and specifies a gradient for colors e Color different residues Indicates differences in aligned residues Foreground color Colors the letter Background color Sets a background color of the residues 14 3 Edit alignments 14 3 1 Move residues and gaps The placement of gaps in the alignment can be changed by modifying the parameters when creating the alignment see section 14 1 However gaps and residues can also be moved after the alignment is created select one or more gaps or residues in the alignment drag the selection to move This can be done both for single sequences but also for multiple sequences by making a selection covering more than one sequence When you have made the selection the mouse pointer turns into a horizontal arrow indicating that the selection can be moved see figure 14 5 Note Residues can only be moved when they are next to a gap AGG AGG AGG AGG ATG ATG 14 3 2 Insert gaps GAGTCAT GAGTCAT GAGCAGT GTGCACC GTGCATC AGG GAGTCAT AGG GAGTCAT AGG GAGCAGT ATG GTGCACC ATG GTGCATC Figure 14 5 Moving a part of an alignment Notice the change of mouse pointer to a horizontal arrow The placement of gaps in the alignment can be changed by modifying the parameters when creating the alignment However gaps can also be added manually after the alignment is CHAPTER 14 SEQUENCE ALIGNMENT 149 created To in
144. ike this ATGACGAATAGGAGTTC TAGCTA you can also paste this into the Navigation Area Note Make sure you copy all the relevant text otherwise CLC Sequence Viewer might not be able to interpret the text 6 1 1 External files In order to help you organize your research projects CLC Sequence Viewer lets you import all kinds of files E g if you have Word Excel or pdf files related to your project you can import them into the Navigation Area of CLC Sequence Viewer Importing an external file creates a copy of the file which is stored at the location you have chosen for import The file can now be opened by double clicking the file in the Navigation Area The file is opened using the default application for this file type e g Microsoft Word for doc files and Adobe Reader for paf External files are imported and exported in the same way as bioinformatics files see section 6 1 Bioinformatics files not recognized by CLC Sequence Viewer are also treated as external files 6 1 2 Import Vector NTI data There are several ways of importing your Vector NTI data into the CLC Workbench The best way to go depends on how your data is currently stored in Vector NTI e Your data is stored in the Vector NTI Local Database which can be accessed through Vector NTI Explorer This is described in the first section below e Your data is stored as single files on your computer just like Word documents etc This is described in the second section be
145. ileges on your system to install it 2 6 Tutorial Create and modify a phylogenetic tree You can make a phylogenetic tree from an existing alignment See how to create an alignment in the tutorial Align protein sequences We use the ATPase protein alignment located in Protein orthologs in the Example data To create a phylogenetic tree click the ATPase protein alignment in the Navigation Area Toolbox Alignments and Trees Create Tree 5 A dialog opens where you can confirm your selection of the alignment Click Next to move to the next step in the dialog where you can choose between the neighbor joining and the UPGMA CHAPTER 2 TUTORIALS EF ATP8al orthol X 31 Q29449 PTMRRTUSE ATP8al PIMERTUSE Q9NTIZ M 094296 MA RERENHON ARRESREEGE BEBAGESMYc RTEDNPEEGE DEFEDIEGSE o P39524 DDRET PPKRKPGEDD TER DEDEE METTSNSCSE gt TSG RRRK CCGHEEEE M R P57792 Q9SX33 Consensus R Conservation gt 430K Sequence logo q a Mersssetino ePFARRFGLE gt EFacesmvaoa rriDaeF LGe Derreve2GSa Q29449 IRSRAEG WEKT DDESEK TSEABOEEI 36 MRSRAEG MEKT DDESEK Ha H A H p 36 EAP E R 094296 o N STNP EEAD T TENDIM EN E KANAE TE P39524 SKMTNSHANG MMNEPPSHYEP EETEDEDADO ONMENDEHEN H MONROE gt H EORSKENTET 22 iii l DV EK TSLXDQXELX dis o n g E no E 0 4 3bits Sequence logo suv TR SRARC TEM ss ee S T enw ERDUSER ePaper es
146. ill just display the cut site with no information about the name of the enzyme Placing the mouse button on the cut site will reveal this information as a tool tip e Flag This will place a flag just above the sequence with the enzyme name see an example in figure 13 2 Note that this option will make it hard to see when several cut sites are located close to each other In the circular view this option is replaced by the Radial option e Radial This option is only available in the circular view It will place the restriction site labels as close to the cut site as possible See an example in figure 13 4 e Stacked This is similar to the flag option for linear sequence views but it will stack the labels so that all enzymes are shown For circular views it will align all the labels on each side of the circle This can be useful for clearly seeing the order of the cut sites when they are located closely together See an example in figure 13 3 Note that in a circular view the Stacked and Radial options also affect the layout of annotations CHAPTER 13 RESTRICTION SITE ANALYSES 133 Figure 13 4 Restriction site labels in radial layout 13 1 1 Sort enzymes Just above the list of enzymes there are three buttons to be used for sorting the list see figure 13 5 Sorting Aa LI Figure 13 5 Buttons to sort restriction enzymes e Sort enzymes alphabetically Aa Clicking this button will sort the list of enzymes alphabetically
147. index increases the thermostability of globular proteins The index is calculated by the following formula Aliphaticindex X Ala ax X Val bx X Leu bx X Ile X Ala X Val X lle and X Leu are the amino acid compositional fractions The constants a and b are the relative volume of valine a 2 9 and leucine isoleucine b 3 9 side chains compared to the side chain of alanine Ikai 1980 Estimated half life The half life of a protein is the time it takes for the protein pool of that particular protein to be reduced to the half The half life of proteins is highly dependent on the presence of the N terminal amino acid thus overall protein stability Bachmair et al 1986 Gonda et al 1989 Tobias et al 1991 The importance of the N terminal residues is generally known as the N end rule The N end rule and consequently the N terminal amino acid simply determines the half life of proteins The estimated half life of proteins have been investigated in mammals yeast and E coli see Table 11 1 If leucine is found N terminally in mammalian proteins the estimated half life is 5 5 hours Extinction coefficient This measure indicates how much light is absorbed by a protein at a particular wavelength The extinction coefficient is measured by UV spectrophotometry but can also be calculated The amino acid composition is important when calculating the extinction coefficient The extinction coefficient is calculated from the ab
148. ing In Click Zoom In 50 in the toolbar click the location in the view that you want to zoom in on or Click Zoom In 540 in the toolbar click and drag a box around a part of the view the view now zooms in on the part you selected or Press on your keyboard The last option for zooming in is only available if you have a mouse with a scroll wheel or Press and hold Ctrl 38 on Mac Move the scroll wheel on your mouse forward When you choose the Zoom In mode the mouse pointer changes to a magnifying glass to reflect the mouse mode Note You might have to click in the view before you can use the keyboard or the scroll wheel to ZOOM If you press the Shift button on your keyboard while clicking in a View the zoom function is reversed Hence clicking on a sequence in this way while the Zoom In mode toolbar item is selected zooms out instead of zooming in 3 3 2 Zoom Out It is possible to zoom out step by step on a sequence Click Zoom Out in the toolbar click in the view until you reach a satisfying zoomlevel or Press on your keyboard The last option for zooming out is only available if you have a mouse with a scroll wheel or Press and hold Ctrl 38 on Mac Move the scroll wheel on your mouse backwards When you choose the Zoom Out mode the mouse pointer changes to a magnifying glass to reflect the mouse mode Note You might have to click in the view before you can use the keyboard or the s
149. ings lt lt eee lt lt we wee eee ew eS 21 2 4 Tutorial GenBank search and download lt lt lt 28 2 4 1 Searching for matching objects eee 28 24 2 Saving the sequence ee 2 2 2 29 2 5 Tutorial Align protein sequences 0088 ee eee een eee 29 2 5 1 The alignment dialog a noaoono ES es 29 2 6 Tutorial Create and modify a phylogenetic tree 30 2 6 1 Tree layout ak me we hw dd E a a A E A Ww a A 31 2 7 Tutorial Find restriction sites 32 2 1 1 The Side Panel way of finding restriction sites 32 2 1 2 The Toolbox way of finding restriction sites 33 This chapter contains tutorials representing some of the features of CLC Sequence Viewer The first tutorials are meant as a short introduction to operating the program The last tutorials give examples of how to use some of the main features of CLC Sequence Viewer Watch video tutorials at http www clcbio com tutorials 2 1 Tutorial Getting started This brief tutorial will take you through the most basic steps of working with CLC Sequence Viewer The tutorial introduces the user interface shows how to create a folder and demonstrates how to import your own existing data into the program When you open CLC Sequence Viewer for the first time the user interface looks like figure 2 1 21 CHAPTER
150. ions Cultos A Export options Figure 6 15 Choosing to include data points with gaps reference sequence for read mappings and the query sequence for BLAST results has gaps If you are exporting e g coverage information from a read mapping you would probably want to exclude gaps if you want the positions in the exported file to match the reference i e chromosome coordinates If you export including gaps the data points in the file no longer corresponds to the reference coordinates because each gap will shift the coordinates Clicking Next will present a file dialog letting you specify name and location for the file The output format of the file is like this Position Value E DE ke MALOS woe eos WAS LJ CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 83 6 5 Copy paste view output The content of tables e g in reports folder lists and sequence lists can be copy pasted into different programs where it can be edited CLC Sequence Viewer pastes the data in tabulator separated format which is useful if you use programs like Microsoft Word and Excel There is a huge number of programs in which the copy paste can be applied For simplicity we include one example of the copy paste function from a Folder Content view to Microsoft Excel First step is to select the desired elements in the view click a line in the Folder Content view hold Shift button press arrow down up key See figure 6 16 3 Sequences
151. it is possibly to put reliability weights on each internal branch of the inferred tree If the data was bootstrapped a 100 times a bootstrap score of 100 means that the corresponding branch occurs in all 100 trees made from re sampled alignments Thus a high bootstrap score is a sign of greater reliability Other useful resources The Tree of Life web project http tolweb org Joseph Felsensteins list of phylogeny software http evolution genetics washington edu phylip software html Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of the work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents Part IV Appendix Appendix A More features You are currently using CLC bio s Sequence Viewer If you want more features try one of our commercial workbenches You can download a one month demo at http www clcbio com software See a list of all the features available below e CLC Sequence Viewer m e CLC Main Workbench a e
152. ities The editing options Options for adding editing and removing annotations Restriction Sites Annotation Types Find and Text Format preferences groups CHAPTER 9 VIEWING AND EDITING SEQUENCES 97 20 40 Gene Gene 1 Gene Gene CLCCECCLCE LCCELCCLCEL cCCLCCLCCLE ECLCOLCCLCE LCCLCCLCCL cc 60 BD 100 Gene Gene Gene LCELCCLCCL CCLCCL CCLC TCLOCCLECCLCC LCCLCCLCCL CCLCCLCELC CL 120 140 Gene I Gene Gene COPLCOELCELO EOSCGLECUECAEECUECGLCECL CCLCCLCCLC CLCCLCCLCC LC 160 180 200 Gene l CLCCLCCLCC LCCLECCLCCL CCOLCCLCCLC CLCELCELEC LECCLCCLCCL ce 220 240 260 Gene Genel LCCELCOELCEL GELCCLCCLC CLECCLCELCO LECLCCLCCL CELCELCELC EL 280 300 CCLCCLCCLC CCLCCLCCLC CCLCCLCCLC CCLCECLCCLC CCLCCLCCLC CC Figure 9 3 Region 1 A single residue Region 2 A range of residues including both endpoints Region 3 A range of residues starting somewhere before 30 and continuing up to and including 40 Region 4 A single residue somewhere between 50 and 60 inclusive Region 5 A range of residues beginning somewhere between 70 and 80 inclusive and ending at 90 inclusive Region 6 A range of residues beginning somewhere between 100 and 110 inclusive and ending somewhere between 120 and 130 inclusive Region 7 A site between residues 140 and 141 Region 8 A site between two residues somewhere between 150 and 160 inclusive Region 9 A region that covers ranges from 170 to 180 inclusive and 190 to 200 inclusiv
153. its amino terminal residue Science 234 4773 179 186 Clote et al 2005 Clote P Ferr F Kranakis E and Krizanc D 2005 Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency RNA 11 5 578 591 Felsenstein 1981 Felsenstein J 1981 Evolutionary trees from DNA sequences a maximum likelinood approach J Mol Evol 17 6 368 376 Feng and Doolittle 1987 Feng D F and Doolittle R F 1987 Progressive sequence align ment as a prerequisite to correct phylogenetic trees J Mol Evol 25 4 351 360 Forsberg et al 2001 Forsberg R Oleksiewicz M B Petersen A M Hein J Botner A and Storgaard T 2001 A molecular clock dates the common ancestor of European type porcine reproductive and respiratory syndrome virus at more than 10 years before the emergence of disease Virology 289 2 1 74 179 Gill and von Hippel 1989 Gill S C and von Hippel P H 1989 Calculation of protein extinction coefficients from amino acid sequence data Anal Biochem 182 2 319 326 Gonda et al 1989 Gonda D K Bachmair A Wunning l Tobias J W Lane W S and Varshavsky A 1989 Universality and structure of the N end rule J Biol Chem 264 28 16700 16712 Hein 2001 Hein J 2001 An algorithm for statistical alignment of sequences related by a binary tree In Pacific Symposium on Biocomputing page 179 Hein et al 2000 Hein J Wiuf C Knudsen
154. ium tuberculosis H37Rw complete genome 2006 11 14 v H Download and Open Y Download and Save Total number of hits 245 Open at NCBI Figure 10 1 The GenBank search view genomic and genome The following parameters can be added to the search e All fields Text searches in all parameters in the NCBI database at the same time Organism Text Description Text Modified Since Between 30 days and 10 years Gene Location Genomic DNA RNA Mitochondrion or Chloroplast Molecule Genomic DNA RNA mRNA or rRNA Sequence Length Number for maximum or minimum length of the sequence Gene Name Text The search parameters are the most recently used The All fields allows searches in all parameters in the NCBI database at the same time All fields also provide an opportu nity to restrict a search to parameters which are not listed in the dialog E g writing gene Feature key AND mouse in All fields generates hits in the GenBank database which contains one or more genes and where mouse appears somewhere in GenBank file You can also write e g CD9 NOT homo sapiens in All fields Note The Feature Key option is only available in GenBank when searching for nucleotide sequences For more information about how to use this syntax see http www ncbi nlm nman gov books NER Ges 77 CHAPTER 10 DATA DOWNLOAD 112 When you are satisfied with the parameters you have entered click Start search Note When conduc
155. ive machine learning algorithms that can simultaneously process data from multiple species Siepel and Haussler 2004 Through the comparative approach valuable evolutionary information can be obtained about which amino acid substitutions are functionally tolerant to the organism and which are not This information can be used to identify substitutions that affect protein function and stability and is of major importance to the study of proteins Knudsen and Miyamoto 2001 Knowledge of the underlying phylogeny is however paramount to comparative methods of inference as the phylogeny describes the underlying correlation from shared history that exists between data from different species CHAPTER 15 PHYLOGENETIC TREES 157 In molecular epidemiology of infectious diseases phylogenetic inference is also an important tool The very fast substitution rate of microorganisms especially the RNA viruses means that these show substantial genetic divergence over the time scale of months and years Therefore the phylogenetic relationship between the pathogens from individuals in an epidemic can be resolved and contribute valuable epidemiological information about transmission chains and epidemiologically significant events Leitner and Albert 1999 Forsberg et al 2001 15 2 3 Reconstructing phylogenies from molecular data Traditionally phylogenies have been constructed from morphological data but following the growth of genetic information it h
156. ivileges you can create a bin directory in your home directory and install symbolic links there You can also choose not to create symbolic links e Wait for the installation process to complete and click Finish If you choose to create symbolic links in a location which is included in your PATH the program can be executed by running the command clcseqview6 Otherwise you start the application by navigating to the location where you choose to install it and running the command clcseqview6 CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 12 1 2 5 Installation on Linux with an RPM package Navigate to the directory containing the rpm package and install it using the rpm tool by running a command similar to rpm ivh CICSequenceViewer 6 JRE rpm If you are installing from a CD the rpm packages are located in the RPMS directory Installation of RPM packages usually requires root privileges When the installation process is finished the program can be executed by running the command clcseqview6 1 3 System requirements The system requirements of CLC Sequence Viewer are these e Windows XP Windows Vista or Windows 7 Windows Server 2003 or Windows Server 2008 e Mac OS X 10 6 or later Intel CPU required However Mac OS X 10 5 8 is supported on 64 bit Intel systems e Linux RedHat 5 or later SuSE 10 or later e 32 or 64 bit e 256 MB RAM required e 512 MB RAM recommended e 1024 x 768 display recomme
157. k will refine the existing sorting 168 APPENDIX C WORKING WITH TABLES 169 C 1 Filtering tables The final concept to introduce is Filtering The table filter as an advanced and a simple mode The simple mode is the default and is applied simply by typing text or numbers see an example in figure C 2 ES Find reading Rows 91 169 Find reading Frame output Filter Length Found at strand Start codon 14 306 57a negative PIIN AM 405 B00 396 negative TT 1378 152 375 negative TAT E 1995 2403 2309 alz negative AAT la dae A 2 mo Figure C 2 Typing neg in the filter in simple mode Typing neg in the filter will only show the rows where neg is part of the text in any of the columns also the ones that are not shown The text does not have to be in the beginning thus ega would give the same result This simple filter works fine for fast textual and non complicated filtering and searching However if you wish to make use of numerical information or make more complex filters you can switch to the advanced mode by clicking the Advanced filter j button The advanced filter is structure in a different way First of all you can have more than one criterion in the filter Criteria can be added or removed by clicking the Add S or Remove E buttons At the top you can choose whether all the criteria should be fulfilled Match all or if just one of the needs to be fulfilled Match any For each
158. late gene expression or help to fold the DNA Nuclear proteins often have a low percentage of aromatic residues Andrade et al 1998 Amino acid distribution Amino acids are the basic components of proteins The amino acid distribution in a protein is simply the percentage of the different amino acids represented in a particular protein of interest Amino acid composition is generally conserved through family classes in different organisms which can be useful when studying a particular protein or enzymes across species borders Another interesting observation is that amino acid composition variate slightly between proteins from different subcellular localizations This fact has been used in several computational methods used for prediction of subcellular localization Annotation table This table provides an overview of all the different annotations associated with the sequence and their incidence Dipeptide distribution This measure is simply a count or frequency of all the observed adjacent pairs of amino acids dipeptides found in the protein It is only possible to report neighboring amino acids Knowledge on dipeptide composition have previously been used for prediction of subcellular localization Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes u
159. lections on sequences and rows in tables Chapter 4 User preferences and settings Contents 4 1 General preferenceS 0 lt 2 ee eee ee ee ee a 57 4 2 Default view preferences 0 08 ee ee ee ee 58 4 2 1 Number formatting in tables lt lt eee 59 4 2 2 Import and export Side Panel settings 2 85808 60 4 3 Advanced preferences 00 8 0 ee ee 1 61 4 4 Export import of preferences 0 00 2 eee een eee ee 61 4 4 1 The different options for export and importing 62 4 5 View settings for the Side Panel 2 2 ee eee et ee et 62 4 5 1 Floating Side Panel o ee 64 The first three sections in this chapter deal with the general preferences that can be set for CLC Sequence Viewer using the Preferences dialog The next section explains how the settings in the Side Panel can be saved and applied to other views Finally you can learn how to import and export the preferences The Preferences dialog offers opportunities for changing the default settings for different features of the program The Preferences dialog is opened in one of the following ways and can be seen in figure 4 1 Edit Preferences 1 13 or Ctrl K 36 on Mac 4 1 General preferences The General preferences include e Undo Limit As default the undo limit is set to 500 By writing a higher number in this field more
160. lete a part of the sequence is to right click the selection Delete Selection If you wish to only correct only one residue this is possible by simply making the selection only cover one residue and then type the new residue 9 1 5 Sequence region types The various annotations on sequences cover parts of the sequence Some cover an interval some cover intervals with unknown endpoints some cover more than one interval etc In the following all of these will be referred to as regions Regions are generally illustrated by markings often arrows on the sequences An arrow pointing to the right indicates that the corresponding region is located on the positive strand of the sequence Figure 9 2 is an example of three regions with separate colors HBE1 Figure 9 2 Three regions on a human beta globin DNA sequence HUMHBB Figure 9 3 shows an artificial sequence with all the different kinds of regions 9 2 Circular DNA A sequence can be shown as a circular molecule select a sequence in the Navigation Area Show in the Toolbar As Circular or If the sequence is already open Click Show As Circular at the lower left part of the view This will open a view of the molecule similar to the one in figure 9 4 This view of the sequence shares some of the properties of the linear view of sequences as described in section 9 1 but there are some differences The similarities and differences are listed below e Similar
161. lgorithms for calculating alignments CHAPTER 14 SEQUENCE ALIGNMENT 146 NM 173881 CDS l NM 000559 7 NM 173881 CDS 1 C NM_000559 f Figure 14 4 The alignment of the coding sequence of bovine myoglobin with the full mRNA of human gamma globin The top alignment is made with free end gaps while the bottom alignment is made with end gaps treated as any other The yellow annotation is the coding sequence in both sequences It is evident that free end gaps are ideal in this situation as the start codons are aligned correctly in the top alignment Treating end gaps as any other gaps in the case of aligning distant homologs where one sequence is partial leads to a spreading out of the short sequence as in the bottom alignment e Fast less accurate This allows for use of an optimized alignment algorithm which is very fast The fast option is particularly useful for data sets with very long sequences e Slow very accurate This is the recommended choice unless you find the processing time too long Both algorithms use progressive alignment The faster algorithm builds the initial tree by doing more approximate pairwise alignments than the slower option 14 2 View alignments Since an alignment is a display of several sequences arranged in rows the basic options for viewing alignments are the same as for viewing sequences Therefore we refer to section 9 1 for an explanation of these basic options However there are a number
162. log In the dialog you select the folder in which you want to save the element After naming the element press OK CHAPTER 3 USER INTERFACE 4 3 2 5 Undo Redo If you make a change in a view e g remove an annotation in a sequence or modify a tree you can undo the action In general Undo applies to all changes you can make when right clicking in a view Undo is done by Click undo in the Toolbar or Edit Undo or Ctrl Z If you want to undo several actions just repeat the steps above To reverse the undo action Click the redo icon in the Toolbar or Edit Redo or Ctrl Y Note Actions in the Navigation Area e g renaming and moving elements cannot be undone However you can restore deleted elements see section 3 1 7 You can set the number of possible undo actions in the Preferences dialog see section 4 3 2 6 Arrange views in View Area Views are arranged in the View Area by their tabs The order of the views can be changed using drag and drop E g drag the tab of one view onto the tab of a another The tab of the first view is now placed at the right side of the other tab lf a tab is dragged into a view an area of the view is made gray see fig 3 10 illustrating that the view will be placed in this part of the View Area PF68225 RLLVVYPWTQRFFESFGDLSSPDAVMGNPK P6s225 VKAHGKKVLGAFSDGLNHLDNLKGTFAQLS PF68225 ELHCDKLHVDPENFKLLGNVLVCVLAHHFG J Figure 3 10 When dragging a view a gray a
163. lon BC139602 Danio rerio hemoglobin beta embryonic 2 mRNA cDNA 2007 05 19 2007 02 08 2007 01 04 2007 01 04 2007 04 18 BC142787 Danio rerio hemoglobin beta embryonic 1 mRNA cDNA 2007 06 11 BX842577 Mycobacterium tuberculosis H37Ry complete genome BI 2006 11 14 y Total number of hits 245 Figure 2 11 NCBI search view Click Start search g to commence the search in NCBI 2 4 1 Searching for matching objects When the search is complete the list of hits is shown If the desired complete human hemoglobin DNA sequence is found the sequence can be viewed by double clicking it in the list of hits from the search If the desired sequence is not shown you can click the More button below the list to see more hits CHAPTER 2 TUTORIALS 29 2 4 2 Saving the sequence The sequences which are found during the search can be displayed by double clicking in the list of hits However this does not save the sequence You can save one or more sequence by selecting them and click Download and Save or drag the sequences into the Navigation Area 2 5 Tutorial Align protein sequences This tutorial outlines some of the alignment functionality of the CLC Sequence Viewer In addition to creating alignments of nucleotide or peptide sequences the software offers several ways to view alignments The alignments can then be used for building phylogenetic trees Sequences must be available
164. loning fe Q95X33 Primers B tj Protein analyse Protein ortholog Ez ATPBal orth 22222 l RNA secondary Sequencing dat 4 HI b Qy zenter search term gt A Figure 14 1 Creating an alignment If you have selected some elements before choosing the Toolbox action they are now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences sequence lists or alignments from the selected elements Click Next to adjust alignment algorithm parameters Clicking Next opens the dialog shown in figure 14 2 y a g Create Alignment 1 Select sequences of same serp amete type 2 Set parameters Gap settings Gap open cost 10 Gap extension cost 1 End gap cost As any other w Alignment O Fast less accurate Slow very accurate Redo alignments se fixpoints IS tres ore le ena Figure 14 2 Adjusting alignment algorithm parameters 14 1 1 Gap costs The alignment algorithm has three parameters concerning gap costs Gap open cost Gap extension cost and End gap cost The precision of these parameters is to one place of decimal e Gap open cost The price for introducing gaps in an alignment e Gap extension cost The price for every extension past the initial gap If you expect a lot of small gaps in your alignment the Gap open cost should equal the Gap extension cost On the other hand if you
165. low Import from the Vector NTI Local Database If your Vector NTI data are stored in a Vector NTI Local Database as the one shown in figure 6 2 you can import all the data in one step or you can import selected parts of it CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS Basic Basic Basic Basic Construc Basic Basic Basic Basic Basic Basic Basic Basic HE a al Exploring Local Vector NTI Database Table Edit View Analyses Align Database Assemble Tools Help E DNA RNA Molecules sD a CP A ta All Subsets All database DNA RNA Molecules E DNA RNA Molecules MAIN Invitrogen vectors xz ADCY7 6196 Linear Adeno2 35937 Linear ADRA 1A 2306 Linear ds BaculoDirect Linear DNA 139370 Linear s BaculoDirect Linear DNA Clonin 5770 Linear 33 BPV1 7945 Circular ue JBRAF 2510 Linear e CDK2 2226 Linear ColE1 6646 Circular 1 CREB1 2964 Linear us JEPAC 3261 Linear ue FYN 2647 Linear ue GNAI1 3367 Linear ram 343 DNA RNA molecules NCBI Entrez NCBI Entrez NCBI Entrez Invitrogen Invitrogen NCBI Entrez NCBI Entrez NCBI Entrez NCBI Entrez NCBI Entrez NCBI Entrez NCBI Entrez NCBI Entrez 3 Figure 6 2 Data stored in the Vector NTI Local Database accessed through Vector NTI Explorer Importing the entire database in one step From the Workbench there is a direct import of the whole database see figure 6 3 File Import Vector NTI Database Edit Search V
166. lso appropriate if you don t have the need for resizing and editing the image after export Vector graphics Vector graphic is a collection of shapes Thus what is stored is e g information about where a line starts and ends and the color of the line and its width This enables a given viewer to decide how to draw the line no matter what the zoom factor is thereby always giving a correct image This format is good for e g graphs and reports but less usable for e g dot plots If the image is to be resized or edited vector graphics are by far the best format to store graphics If you open CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 80 a vector graphics file in an application like e g Adobe Illustrator you will be able to manipulate the image in great detail Graphics files can also be imported into the Navigation Area However no kinds of graphics files can be displayed in CLC Sequence Viewer See section 6 1 1 for more about importing external files into CLC Sequence Viewer 6 3 3 Graphics export parameters When you have specified the name and location to save the graphics file you can either click Next or Finish Clicking Next allows you to set further parameters for the graphics export whereas clicking Finish will export using the parameters that you have set last time you made a graphics export in that file format if it is the first time it will use default parameters Parameters for bitmap formats For bitmap files cli
167. map analysis performed from the Toolbox 2 1 1 The Side Panel way of finding restriction sites When you open a sequence there is a Restriction sites setting in the Side Panel By default 10 of the most popular restriction enzymes are shown see figure 2 17 Restriction sites Show Labels Stacked Sorting Aa LI P Non cutters Single cutters E y sami amp EJ F Eor a E MN 7 ecorv o O D 4 Hindi 1 O Mirmo O Pio O Fito O Double cutters MN 1 son 2 DOR sma a Multiple cutters eS o EN 4 salt 3 Figure 2 17 Showing restriction sites of ten restriction enzymes ST TAGAGGGCCCGTTTAAACC The restriction sites are shown on the sequence with an indication of cut site and recognition sequence In the list of enzymes in the Side Panel the number of cut sites is shown in parentheses for each enzyme e g Sall cuts three times If you wish to see the recognition sequence of the enzyme place your mouse cursor on the enzyme in the list for a short moment and a tool tip will appear You can add or remove enzymes from the list by clicking the Manage enzymes button CHAPTER 2 TUTORIALS 33 2 1 2 The Toolbox way of finding restriction sites Suppose you are working with sequence ATP8a1 MRNA from the example data and you wish to know which restriction enzymes will cut this sequence exactly once and create a 3 overhang Do the following select the ATP8a1 mRNA Toolbox in the Menu Bar
168. mat 1 3 pdf format export 79 Personal information 13 Pfam domain search 163 phr file format 1 3 PHR file format 173 Phred file format 1 2 phy file format 1 3 Phylip file format 1 3 Phylogenetic tree 152 164 tutorial 30 Phylogenetics Bioinformatics explained 155 pir file format 1 3 PIR NBRP file format 172 Plug ins 15 png format export 9 Polarity colors 93 Portrait Print orientation 68 Positively charged residues 121 PostScript export 9 Preference group 62 Preferences 5 advanced 61 185 export 61 General 5 import 61 style sheet 62 toolbar 59 View 58 view 49 Primer design 164 design from alignments 164 Print 66 preview 69 visible area 6 whole view 6 pro file format 1 3 Problems when starting up 13 Processes 52 Properties batch edit 42 Protein charge 163 Isoelectric point 119 report 162 statistics 118 Proteolytic cleavage 163 Proxy server 18 ps format export 79 psi file format 1 3 PubMed references search 162 Quick start 14 Rasmol colors 93 Reading frame 128 Realign alignment 163 Rebase restriction enzyme database 140 Recycle Bin 42 Redo Undo 4 Reference sequence 161 References 1 9 Region types 96 Remove annotations 102 terminated processes 52 Rename element 41 Report program errors 13 Report protein 162 Request new feature 13 Residue coloring 93 INDEX Restore deleted elemen
169. mbers all over the program e Show Dialogs A lot of information dialogs have a checkbox Never show this dialog again When you see a dialog and check this box in the dialog the dialog will not be shown again If you regret and wish to have the dialog displayed again click the button in the General Preferences Show Dialogs Then all the dialogs will be shown again Deleted selection Editing of sequence selection 220 0 260 GAGATGCCATGCGGAGGACAGTCGGAGATCCGCTCGCGCGCGGA Figure 4 2 Annotations added when the sequence is edited Deleted selection Editing of sequence selection 220 260 GAGATGCC GATCCGCTCGCGCGCGGAAGGTTAT Figure 4 3 Details of the editing 4 2 Default view preferences There are five groups of default View settings CHAPTER 4 USER PREFERENCES AND SETTINGS 59 Toolbar Side Panel Location LL 2 3 New View 4 View Format Ol User Defined View Settings In general these are default settings for the user interface The Toolbar preferences let you choose the size of the toolbar icons and you can choose whether to display names below the icons The Side Panel Location setting lets you choose between Dock in views and Float in window When docked in view view preferences will be located in the right side of the view of e g an alignment When floating in window the side panel can be placed everywhere in your screen also outside the workspace e g on a different screen S
170. meaning that when adding search parameters to your search you search for both or all text strings rather than any of the text strings You can append a wildcard character by checking the checkbox at the bottom This means that you only have to enter the first part of the search text e g searching for genom will find both 110 CHAPTER 10 DATA DOWNLOAD 111 NCBI search O Choose database Nucleotide Protein al Fields v human E al Fields v hemoglobin E E A All Fields v complete E Add search parameters 8 Start search Append wildcard to search words Rows 50 Search results Filter Accession Definition Modification Date A AM270166 Aspergillus niger contig An08c0110 complete genome 2007 03 24 BM711867 Clavibacter michiganensis subsp michiganensis NCPPB 2007 05 18 AP008209 Oryza sativa japonica cultivar group genomic DNA c 2007 05 19 J BA000016 Clostridium perfringens str 13 DNA complete genome 2007 05 19 BC029387 Homo sapiens hemoglobin gamma G mRNA cDNA clon 2007 02 08 BC130457 Homo sapiens hemoglobin gamma G mRNA cDNA clon 2007 01 04 BC130459 Homo sapiens hemoglobin gamma G mRNA cDNA con 2007 01 04 _ BC139602 Danio rerio hemoglobin beta embryonic 2 mRNA cDNA 2007 04 18 BC142787 Danio rerio hemoglobin beta embryonic 1 mRNA cDNA 2007 06 11 Bx842577 Mycobacter
171. means that the residue at position 23 is selected and finally 23 25 means that 23 24 and 25 are selected By holding ctrl 38 you can make multiple selections 3 3 7 Changing compactness There is a shortcut way of changing the compactness setting for read mappings or Press and hold Alt key Scroll using your mouse wheel or touchpad 3 4 Toolbox and Status Bar The Toolbox is placed in the left side of the user interface of CLC Sequence Viewer below the Navigation Area The Toolbox shows a Processes tab and a Toolbox tab CHAPTER 3 USER INTERFACE 92 3 4 1 Processes By clicking the Processes tab the Toolbox displays previous and running processes e g an NCBI search or a calculation of an alignment The running processes can be stopped paused and resumed by clicking the small icon jj next to the process see figure 3 15 Running and paused processes are not deleted Toolbo rch Database nucleotide NC 012671 Seal as E Cre ate Alignment EE Wl 5 mM HES W AR 0 8 0 RRR 0 0 RRR 0 RRR RRR 100 a Q Search Database nucleotide NC_012671 Ea MERRRRRRRRRRRRRRERRRARANNNA 100 Search Database nucleotide human Po ee DN Figure 3 15 A database search and an alignment calculation are running Clicking the small icon next to the process allow you to stop pause and resume processes Besides the options to stop pause and res
172. ments and other data are shown You will also see how to save the changes that you made in the Side Panel Open the protein alignment located under Protein orthologs in the Example data The initial view of the alignment has colored the residues according to the Rasmol color scheme and the alignment is automatically wrapped to fit the width of the view shown in figure 2 5 Now we are going to modify how this alignment is displayed For this we use the settings in the Side Panel to the right All the settings are organized into groups which can be expanded collapsed by clicking the name of the group The first group is Sequence Layout which is CHAPTER 2 TUTORIALS 25 ace pcDNA3 atp8al X Bglll Sall 20 40 Ti pcDNA3 atp8al GACGGATCGGGAGATCTCCCGATCCCCTATGGTCGACTCTCAGTACAATCTGCTCT 60 80 100 pDCDNA3 ato8al GATGCCGCATAGTTAAGCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGA fer O ES El 11 9 E 0d Y q O pcDNA3 atp8al x CMV promoter Sall T7 Promoter Atp8al I gt 2 8 pu Sall Ampicillin ORF ColE1 origin Pst Neomycin ORF Sall SV40 promoter SV40 origin of replicatio BHG Poly A Sp6 promote To o 3 SIGE FE eRe lt lt 9 TERESA Figure 2 4 The resulting two views which are split horizontally expanded by default First select No wrap in the Sequence Layout This means that each sequence in the alignment is ke
173. meters until three search criteria are available choose Organism in the first drop down menu write humanr in the adjoining text field choose All Fields in the second drop down menu write hemoglobin in the adjoining text field choose All Fields in the third drop down menu write complete in the adjoining text field NCBI search Choose database Nucleotide O Protein By clicking Add search parameters you activate an additional set of fields where you can enter search criteria Each search criterion consists of a drop down menu and a text field In the drop down menu you choose which part of the NCBI database to search and in the text field you enter what to search for All Fields human E All Fields v hemoglobin E All Fields v complete E Add search parameters 8 Start search C Append wildcard to search words Rows 50 Search results Filter Accession Definition AM270166 Aspergillus niger contig An08c0110 complete genome AM711867 Clavibacter michiganensis subsp michiganensis NCPPB AP008209 Oryza sativa japonica cultivar group genomic DNA c Modification Date 2007 03 24 2007 05 18 2007 05 19 BA000016 Clostridium perfringens str 13 DNA complete genome BC029387 Homo sapiens hemoglobin gamma G mRNA CDMA clon BC130457 Homo sapiens hemoglobin gamma G mRNA cDNA clon BC130459 Homo sapiens hemoglobin gamma G MRNA cDNA c
174. mmas If the enzyme s recognition sequence is on the negative strand the cut position is put in brackets as the enzyme Tsol in figure 13 15 whose cut position is 134 Some enzymes cut the sequence twice for each recognition site and in this case the two cut positions are surrounded by parentheses 13 3 Restriction enzyme lists CLC Sequence Viewer includes all the restriction enzymes available in the REBASE database However when performing restriction site analyses it is often an advantage to use a customized list of enzymes In this case the user can create special lists containing e g all enzymes available in the laboratory freezer all enzymes used to create a given restriction map or all enzymes that are available form the preferred vendor In the example data see section 1 5 2 under Nucleotide gt Restriction analysis there are two enzyme lists one with the 50 most popular enzymes and another with all enzymes that are included in the CLC Sequence Viewer This section describes how you can create an enzyme list and how you can modify it 13 3 1 Create enzyme list CLC Sequence Viewer uses enzymes from the REBASE restriction enzyme database at http rebase neb com To create an enzyme list of a subset of these enzymes This opens the dialog shown in figure 13 16 Create new enzyme list es p 1 Please choose enzymes ASAS UA Enzyme list Use existing enzyme list All enz
175. mouse button drag the element to the desired location press Ctrl 38 on Mac while you let go of mouse button release the Ctrl 36 button 3 1 6 Change element names This section describes two ways of changing the names of sequences in the Navigation Area In the first part the sequences themselves are not changed it s their representation that changes The second part describes how to change the name of the element Change how sequences are displayed Sequence elements can be displayed in the Navigation Area with different types of information e Name this is the default information to be shown e Accession sequences downloaded from databases like GenBank have an accession number e Latin name e Latin name accession e Common name e Common name accession Whether sequences can be displayed with this information depends on their origin Sequences that you have created yourself or imported might not include this information and you will only be able to see them represented by their name However sequences downloaded from databases like GenBank will include this information To change how sequences are displayed right click any element or folder in the Navigation Area Sequence Representation select format This will only affect sequence elements and the display of other types of elements e g alignments trees and external files will be not be changed If a sequence does not have this information there will b
176. mported sequences of the Vector NTI Database Importing parts of the database Instead of importing the whole database automatically you can export parts of the database from Vector NTI Explorer and subsequently import into the Workbench First export a selection of files as an archive as shown in figure 6 5 Exploring Local Vector NTI Database DNA RNA Edit View Analyses Align Database Assemble Tools Help Order E FR td Open ase DNA RNA Molecules Edit Linear Basic NCBI Entrez NCBI E New a 35937 Linear Basic NCBI Entrez NCBI E Import k 2506 Linear Basic NCBI Entrez NCBI E Molecule into Text file Linear Basic Invitrogen Invitro Gateway cloning Sequence into Text file a E TON iaia Launch TOPO wizard 5 ni a o Entrez A E D Linear Basic NCBI Entrez NCBI E Delete with Descendants from DB Do 25 linear Basic NCBI Entrez NOBILE 5646 Circular Basic NCBI Entrez NCBI E Exclude from Subset 2964 Linear Basic NCBI Entrez WNCBIE x Delete from Database 3261 Linear Basic NCBI Entrez NCBI E 2647 Linear Basic NCBI Entrez NCBI E Dimon rn m Figure 6 5 Select the relevant files and export them as an archive through the File menu This will produce a file with a ma4 pa4 or oa4 extension Back in the CLC Workbench click Import and select the file Importing single files In Vector NTI you can save a sequence in a file instead of in the database see figure 6 6 This will give you file wit
177. n figure 2 3 and zoomed to see the residues In this tutorial we want to have an overview of the whole sequence Hence CHAPTER 2 TUTORIALS 24 act pcDNAS atp8ai 3 140 160 l l pcOnAs apeal TTTAAGCTACAACAAGGCAAGGCTTGACCGACAATTGCATGAA CMV promoter 150 200 pcONAs3 atpsal GAATCTGCTTAGGCGTTAGGCGTTTTGCGCTGCTTCGCGATGTA CMV promoter pcONAs atpsal CGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATT Figure 2 3 Sequence pcDNAS atp8al opened in a view click Zoom Out 5 in the Toolbar click the sequence until you can see the whole sequence This sequence is circular which is indicated by lt lt and gt gt at the beginning and the end of the sequence In the following we will show how the same sequence can be displayed in two different views one linear view and one circular view First zoom in to see the residues again by using the Zoom In 545 or the 100 1 Then we make a split view by press and hold the Ctrl button on the keyboard 38 on Mac click Show as Circular at the bottom of the view This opens an additional view of the vector with a circular display as can be seen in figure 2 4 Make a selection on the circular sequence remember to switch to the Selection h tool in the tool bar and note that this selection is also reflected in the linear view above 2 3 Tutorial Side Panel Settings This brief tutorial will show you how to use the Side Panel to change the way your sequences align
178. n the specified view If you now move leaves the leaves in all views are moved The options of the right click pop up menu are changing the tree and therefore they change all views 15 2 Bioinformatics explained phylogenetics Phylogenetics describes the taxonomical classification of organisms based on their evolutionary history e their phylogeny Phylogenetics is therefore an integral part of the science of systematics that aims to establish the phylogeny of organisms based on their characteristics Furthermore phylogenetics is central to evolutionary biology as a whole as it is the condensation of the overall paradigm of how life arose and developed on earth CHAPTER 15 PHYLOGENETIC TREES 156 15 2 1 The phylogenetic tree The evolutionary hypothesis of a phylogeny can be graphically represented by a phylogenetic tree Figure 15 4 shows a proposed phylogeny for the great apes Hominidae taken in part from Purvis Purvis 1995 The tree consists of a number of nodes also termed vertices and branches also termed edges These nodes can represent either an individual a species or a higher grouping and are thus broadly termed taxonomical units In this case the terminal nodes also called leaves or tips of the tree represent extant species of Hominidae and are the operational taxonomical units OTUs The internal nodes which here represent extinct common ancestors of the great apes are termed hypothetical taxonomical units since they
179. nal form and CLC bio has to be clearly labeled as author and provider of the work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents Chapter 15 Phylogenetic trees Contents 15 1 Inferring phylogenetic trees 2 ee 152 15 1 1 Phylogenetic tree parameters 0 0 0 152 15 1 2 Tree View Preferences ee ee ee 154 15 2 Bioinformatics explained phylogenetics 00 8808 ee enue 155 15 2 1 The phylogenetic tree 156 15 2 2 Modern usage of phylogenies 156 15 2 3 Reconstructing phylogenies from molecular data 157 15 2 4 Interpreting phylogenies 2 158 CLC Sequence Viewer offers different ways of inferring phylogenetic trees The first part of this chapter will briefly explain the different ways of inferring trees in CLC Sequence Viewer The second part Bioinformatics explained will give a more general introduction to the concept of phylogeny and the associated bioinformatics methods 15 1 Inferring phylogenetic trees For a given set of aligned sequences see chapter 14 it is possible to infer their evolutionary relationships In CLC Sequence Viewer this is done by c
180. nded 1 4 About CLC Workbenches In November 2005 CLC bio released two Workbenches CLC Free Workbench and CLC Protein Workbench CLC Protein Workbench is developed from the free version giving it the well tested user friendliness and look amp feel However the CLC Protein Workbench includes a range of more advanced analyses In March 2006 CLC DNA Workbench formerly CLC Gene Workbench and CLC Main Workbench were added to the product portfolio of CLC bio Like CLC Protein Workbench CLC DNA Workbench builds on CLC Free Workbench It shares some of the advanced product features of CLC Protein Workbench and it has additional advanced features CLC Main Workbench holds all basic and advanced features of the CLC Workbenches In June 2007 CLC RNA Workbench was released as a sister product of CLC Protein Workbench and CLC DNA Workbench CLC Main Workbench now also includes all the features of CLC RNA Workbench In March 2008 the CLC Free Workbench changed name to CLC Sequence Viewer CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 13 In June 2008 the first version of the CLC Genomics Workbench was released due to an extraordinary demand for software capable of handling sequencing data from the new high throughput sequencing systems like 454 Illumina Genome Analyzer and SOLID For an overview of which features all the applications include see http www clcbio com features In December 2006 CLC bio released a Software Developer Kit
181. nder the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of the work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents CHAPTER 11 GENERAL SEQUENCE ANALYSES 122 11 3 Join sequences CLC Sequence Viewer can join several nucleotide or protein sequences into one sequence This feature can for example be used to construct supergenes for phylogenetic inference by joining several disjoint genes into one Note that when sequences are joined all their annotations are carried over to the new spliced sequence Two or more sequences can be joined by select sequences to join Toolbox in the Menu Bar General Sequence Analyses Join sequences 258 or select sequences to join right click any selected sequence Toolbox General Sequence Analyses Join sequences 58 This opens the dialog shown in figure 11 5 a G Join Sequences LES 1 Select sequences of same ME ss ences of same type n type Projects Selected Elements 2 Els CLC Data ue 094296 gt Example Data ss P39524 XxX ATP8al genomit 2 ATPSal mRNA fhs ATPSal Cloning Primers Protein analyse Protein ortholog oN gt SN P57792 pq Q29449 Sas QONTI2
182. ne if the IUPAC option is selected The IUPAC codes can be found in section F and E No gaps Checking this option will not show gaps in the consensus Ambiguous symbol Select how ambiguities should be displayed in the consensus line as N or This option has now effect if IUPAC is selected in the Limit list above The Consensus Sequence can be opened in a new view simply by right clicking the Consensus Sequence and click Open Consensus in New View e Conservation Displays the level of conservation at each position in the alignment The conservation shows the conservation of all sequence positions The height of the bar or the gradient of the color reflect how conserved that particular position is in the alignment If one position is 100 conserved the bar will be shown in full height and it is colored in the color specified at the right side of the gradient slider Foreground color Colors the letters using a gradient where the right side color is used for highly conserved positions and the left side color is used for positions that are less conserved Background color Sets a background color of the residues using a gradient in the same way as described above Graph Displays the conservation level as a graph at the bottom of the alignment The bar default view show the conservation of all sequence positions The height of the graph reflects how conserved that particular position is in the alignment
183. nt in the recycle bin when it was emptied 3 1 8 Show folder elements in a table A location or a folder might contain large amounts of elements It is possible to view their elements in the View Area select a folder or location Show in the Toolbar Contents H An example is shown in figure 3 4 When the elements are shown in the view they can be sorted by clicking the heading of each of the columns You can further refine the sorting by pressing Ctrl on Mac while clicking the heading of another column Sorting the elements in a view does not affect the ordering of the elements in the Navigation Area Note The view only displays one layer at a time the content of subfolders is not visible in this view Also note that only sequences have the full span of information like organism etc Batch edit folder elements You can select a number of elements in the table right click and choose Edit to batch edit the elements In this way you can change the e g the description or common name of several CHAPTER 3 USER INTERFACE E Cloning vecto E Rows BO RERKKRKRKRRKRRKKKRER E Mame M13mp8 pLiCS Tue Jun 30 M13mp9 puica Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 0 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 Tue Jun 30 pACYCIF pAcYC1ad p M34 p TISS pAT
184. odaikn P68945 EvhwtaeekaTitolwokvnvadcgacalar ic cc 1ivypwtarffssfanissptailonpmvrahokkvitsfogdavkn P68063 EvhWwtae8ka Fita iag HH EEE EEE HA n NP 032247 mynftaeektlinglwskunveevagealori ESSivvypwthrffosfonissasaimonprukahokkvltafgesiknl CAA32220 myhftacekaaitsiwdkvdlekvogetloriEssanivypwtarffokfontssagaimonprikahokkvEtsglavkni CAA24102 HEBER FER ESE BRSEEEE CRRCEREEEERNNNEEEEEEEEEEA ERRERA E E HEE ni P04443 GUNFLoeekoBitsiWGkWAI ckVGGeRNGFISEEENiWYOWLGREToKTonissocainonprikah kkvitsTglavkni Q6WN28 mvhItgeeksavtalwokvnvdevogealoriERS21vvypwtarffesfodistodavmnnpkikahokkvigafsdglth Q6WN21 MWh EgeBksavtt AHHH HH EAH et A P67821 muhitaceksavttIiwgkvnvdevggea Gr IESESivvyoWtaGrhfoSfgdist pdavmnApkWKERGKKV gafsdgith CAA26204 Mvhtpeeksavtalwokvnvdevggea lor ivsrituvypwtarffesfodistodavmonpkvkahokkvigafsdalan P68873 MuhIEpesksavtalwokvnvdevgusa lor i22 22 luvypwtartfesfadistpdavmonpkvkahokkvIgafsdglan Figure 14 6 The tabular format of a multiple alignment of 24 Hemoglobin protein sequences Sequence names appear at the beginning of each row and the residue position is indicated by the numbers at the top of the alignment columns The level of sequence conservation is shown on a color scale with blue residues being the least conserved and red residues being the most conserved is however not straightforward as it increases the number of model parameters considerably It is therefore commonplace to either ignore this complication and assume sequences
185. of alignment specific view options in the Alignment info and the Nucleotide info in the Side Panel to the right of the view Below is more information on these view options Under Translation in the Nucleotide info there is an extra checkbox Relative to top sequence Checking this box will make the reading frames for the translation align with the top sequence so that you can compare the effect of nucleotide differences on the protein level The options in the Alignment info relate to each column in the alignment e Consensus Shows a consensus sequence at the bottom of the alignment The consensus sequence is based on every single position in the alignment and reflects an artificial sequence which resembles the sequence information of the alignment but only as one CHAPTER 14 SEQUENCE ALIGNMENT 147 single sequence If all Sequences of the alignment is 100 identical the consensus sequence will be identical to all sequences found in the alignment If the sequences of the alignment differ the consensus sequence will reflect the most common sequences in the alignment Parameters for adjusting the consensus sequences are described below Limit This option determines how conserved the sequences must be in order to agree on a consensus Here you can also choose IUPAC which will display the ambiguity code when there are differences between the sequences E g an alignment with A and a G at the same position will display an R in the consensus li
186. older When CLC Sequence Viewer is started there is one element in the Navigation Area called CLC Data This element is a Location A location points to a folder on your computer where your data for use with CLC Sequence Viewer is stored The data in the location can be organized into folders Create a folder File New Folder or Ctrl Shift N 3 Shift N on Mac Name the folder My folder and press Enter lf you have downloaded the example data this will be placed as a folder in CLC Data CHAPTER 2 TUTORIALS 23 2 1 2 Import data Next we want to import a sequence called HUMDINUC fsa FASTA format from our own Desktop into the new My folder This file is chosen for demonstration purposes only you may have another file on your desktop which you can use to follow this tutorial You can import all kinds of files In order to import the HUMDINUC fsa file Select My folder Import 5 in the Toolbar navigate to HUMDINUC fsa on the desktop Select The sequence is imported into the folder that was selected in the Navigation Area before you clicked Import Double click the sequence in the Navigation Area to view it The final result looks like figure 2 2 f CLC Free Workbench 4 0 Current workspace Default Sel File Edit Search View Toolbox Workspace Help Show New Import Export Graphics Print Copy Workspace Search Fit Width 10096 Selection Zoom In Zoom Out Javigation Are At HUMDINUC E T
187. otein sequences are generated Translate coding regions You can choose to translate regions marked by and CDS or ORF annotation This will generate a protein sequence for each CDS or ORF annotation on the sequence Genetic code translation table Lets you specify the genetic code for the translation The translation tables are occasionally updated from NCBI The tables are not available in this printable version of the user manual Instead the tables are included in the Help menu in the Menu Bar in the appendix CHAPTER 12 NUCLEOTIDE ANALYSES 128 A G Translate to Protein Es 1 Select nucleotide RSS Sci sequences 2 Set parameters Translation of whole sequence J Reading frame 1 Reading frame 2 Reading frame 3 Reading frame 1 Reading frame 2 Reading frame 3 Translation of coding regions V Translate CDS anslate ORF Genetic code translation table ad 1 Standard X 2 la Cermo que Seh XK cancel Figure 12 5 Choosing 1 and 3 reading frames and the standard translation table Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish The newly created protein is shown but is not saved automatically To save a protein sequence drag it into the Navigation Area or press Ctrl S S on Mac to activate a save dialog 12 5 Find open reading frames The CLC Sequence Viewer Find Open Reading Frames function can be used to find
188. ower left part of the view CHAPTER 9 VIEWING AND EDITING SEQUENCES 102 Annotation types fo Mos O O 4 Conflict 67 o Exon Ce C Gene SREE EM 7 mana HEG2 34478 36069 HBG1 39414 40985 EA Dl old setHep 54740 56339 HBB 62137 63742 MM O Frece ee thalassemia lt 62187 62380 EA UU Repea MA Repeat unit Figure 9 8 Browsing the gene annotations on a sequence ES NM 000044 annotation Ta DES x Rows 28 Filter h e iS E Type Region Qualifiers Shown annotation types CDS forganism Homo sapiens Gene fuel qm Repeat region i source fdh xref taxon 9606 fi Y fchromosome X Sinto map Xq11 2 q12 v 515 Select all fgene AR Deselect all 1023 1097 fstandard name GDB 600694 db_xref UniSTS 99252 gene AR 836 958 Fstandard_name DX37408 fdb_xref UniSTS 38944 SERE Figure 9 9 A table showing annotations on the sequence This will open a view similar to the one in figure 9 9 In the Side Panel you can show or hide individual annotation types in the table E g if you only wish to see gene annotations de select the other annotation types so that only gene is selected Each row in the table is an annotation which is represented with the following information e Name e Type e Region e Qualifiers 9 3 2 Removing annotations Annotations can be hidden using the Annotation Types preferences in the Side Panel to the right of the vi
189. pens the sequence lt can be saved by clicking Save or by dragging the tab of the sequence view into the Navigation Area 9 7 Sequence Lists The Sequence List shows a number of sequences in a tabular format or it can show the sequences together in a normal sequence view Having sequences in a sequence list can help organizing sequence data The sequence list may originate from an NCBI search chapter 10 1 Moreover if a multiple sequence fasta file is imported it is possible to store the data in a sequences list A Sequence List can also be generated using a dialog which is described here select two or more sequences right click the elements New Sequence List This action opens a Sequence List dialog CHAPTER 9 VIEWING AND EDITING SEQUENCES 106 q Create Sequence List eS 1 Select sequences of same ee ECE SEQUETICES UF SS typ Projects Selected Elements 6 E CLC_Data as 094296 3 Example Data Su P39524 XxX ATP8al genomit Sw P57792 XxX ATP8al mRNA su Q29449 Sw ATP8al ue QONTI2 Cloning se Q95X33 Primers Protein analyse Protein ortholog SN Ns EM Sus _ sv ED e RNA secondary Sequencing dat Qx lt enter search term gt 4 me La Ken Figure 9 12 A Sequence List dialog The dialog allows you to select more sequences to include in the list or to remove already chosen sequences from the list Clicking Finish opens the sequence list l
190. port and export of bioinformatics data such as sequences alignments etc described in section 6 1 e Graphics export of the views which creates image files in various formats described in section 6 3 e Import and export of Side Panel Settings as described above e Import and export of all the Preferences except the Side Panel settings This is described in the previous section 4 3 Advanced preferences The Advanced settings include the possibility to set up a proxy server This is described in section 1 4 4 Export import of preferences The user preferences of the CLC Sequence Viewer can be exported to other users of the program allowing other users to display data with the same preferences as yours You can also use the export import preferences function to backup your preferences CHAPTER 4 USER PREFERENCES AND SETTINGS 62 To export preferences open the Preferences dialog Ctrl K 38 on Mac and do the following Export Select the relevant preferences Export Choose location for the exported file Enter name of file Save Note The format of exported preferences is cpf This notation must be submitted to the name of the exported file in order for the exported file to work Before exporting you are asked about which of the different settings you want to include in the exported file One of the items in the list is User Defined View Settings If you export this only the information about which of th
191. problem Note No personal information is sent via the error report Only the information which can be seen in the Program Error Submission Dialog is submitted You can also write an e mail to supportOclcbio com Remember to specify how the program error can be reproduced All errors will be treated seriously and with gratitude We appreciate your help Start in safe mode If the program becomes unstable on start up you can start it in Safe mode This is done by pressing and holding down the Shift button while the program starts When starting in safe mode the user settings e g the settings in the Side Panel are deleted and cannot be restored Your data stored in the Navigation Area is not deleted When started in CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 14 safe mode some of the functionalities are missing and you will have to restart the CLC Sequence Viewer again without pressing Shift 1 4 3 CLC Sequence Viewer vs Workbenches The advanced analyses of the commercial workbenches CLC Genomics Workbench and CLC Main Workbench are not present in CLC Sequence Viewer Likewise some advanced analyses are available in CLC Genomics Workbench but not in CLC Main Workbench All types of basic and advanced analyses are available in CLC Genomics Workbench However the output of the commercial workbenches can be viewed in all other workbenches This allows you to share the result of your advanced analyses from e g CLC Main Wo
192. pt on the same line To see more of the alignment you now have to scroll horizontally Next expand the Annotation Layout group and select Show Annotations Set the Offset to More offset and set the Label to Stacked Expand the Annotation Types group Here you will see a list of the types annotation that are carried by the sequences in the alignment see figure 2 6 Check the Region annotation type and you will see the regions as red annotations on the sequences Next we will change the way the residues are colored Click the Alignment Info group and under Conservation check Background color This will use a gradient as background color for the residues You can adjust the coloring by dragging the small arrows above the color box 2 3 1 Saving the settings in the Side Panel Now the alignment should look similar to figure 2 7 At this point if you just close the view the changes made to the Side Panel will not be saved CHAPTER 2 TUTORIALS 26 HEE ATP8al orthol X 20 Settings x w E 029449 M PIMRRTUSE E a ATP8a1 M ein PIMRRTUSE as layout Q9NTIZ M eee Bs 094296 MARBMBNKON AKRISRDEDE DERACESMYs RTEDNPRECE DEREDIECSE o Every 10 residues y P39524 MN BBRET PPKRKPGEDD TEE DODELE MERRRRAGHR gt O No wrap P57792 MATNSGRRRK Q95x33 cccauana Auto wrap Consensus M R R O Fixed wrap Conservation i E E i very esidue 4 08 lv
193. rabidopsis thaliana Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe 100 Mus musculus Bos taurus Homo sapiens of Mus musculus Bos taurus Homo sapiens Saccharomyces cerevisiae Schizosaccharomyces pombe Arabidopsis thaliana Arabidopsis thaliana Figure 15 5 Algorithm choices for phylogenetic inference The bottom shows a tree found by the neighbor joining algorithm while the top shows a tree found by the UPGMA algorithm The latter algorithm assumes that the evolution occurs at a constant rate in different lineages based on all the individual characters nucleotides or amino acids Parsimony In parsimony based methods a number of sites are defined which are informative about the topology of the tree Based on these the best topology is found by minimizing the number of substitutions needed to explain the informative sites Parsimony methods are not based on explicit evolutionary models Maximum Likelihood Maximum likelihood and Bayesian methods see below are probabilistic methods of inference Both have the pleasing properties of using explicit models of molecular evolution and allowing for rigorous statistical inference However both approaches are very computer intensive A stochastic model of molecular evolution is used to assign a probability likelinood to each phylogeny given the sequence data of the OTUs Maximum likelihood inference Felsenstein 1981 then consists of finding the
194. rea indicates where the view will be shown The results of this action is illustrated in figure 3 11 You can also split a View Area horizontally or vertically using the menus Splitting horisontally may be done this way CHAPTER 3 USER INTERFACE 48 ae Peso O ger Pagosa O act P6S063 EQ gt P66063 LLIVYPWTORFFASFONLSSPTAI IGNPMV L act P6BZ25 E IR P66225 RLLVVYPWTORFFESFGDLSSPDAVMGNPK Figure 3 11 A horizontal split screen The two views split the View Area right click a tab of the view View Split Horizontally This action opens the chosen view below the existing view See figure 3 12 When the split is made vertically the new view opens to the right of the existing view aE PERDAS O ne P68225 aer P68053 O aer PES046 x 3 z w SEE E PE8225 VDEVGGEAL P68046 DEVGGEALGF E DS P68225 RLLVVYPWTI P68046 LLVVYPWTOF CA P68225 RFFESFGDL P68046 FFDSFGDLS a kS HEIE i gt E ME Figure 3 12 A vertical split screen Splitting the View Area can be undone by dragging e g the tab of the bottom view to the tab of the top view This is marked by a gray area on the top of the view Maximize Restore size of view The Maximize Restore View function allows you to see a view in maximized mode meaning a mode where no other views nor the Navigation Area is
195. reating a phylogenetic tree a Toolbox in the Menu Bar Alignments and Trees Create Tree 5 or right click alignment in Navigation Area Toolbox Alignments and Trees 2 Create Tree HE This opens the dialog displayed in figure 15 1 If an alignment was selected before choosing the Toolbox action this alignment is now listed in the Selected Elements window of the dialog Use the arrows to add or remove elements from the Navigation Area Click Next to adjust parameters 15 1 1 Phylogenetic tree parameters Figure 15 2 shows the parameters that can be set 152 CHAPTER 15 PHYLOGENETIC TREES 153 o G Create Tree 1 Select alignments of MESS gnments of same typ ici Projects Selected Elements 1 5 fa CLC_Data IEE alignment 1 gt Example Data E3 Cloning Primers Protein analyses Protein orthologs Ex RNA secondary str Sequencing data Q lt enter search term gt A Figure 15 1 Creating a Tree E BB Create Tree ES 1 Select alignments of Separate EEE pe same ty 2 Set parameters Algorithm Neighbor Joining w Bootstrapping Y Perform bootstrap analysis Replicates 100 CS Gema a Xe Figure 15 2 Adjusting parameters e Algorithms The UPGMA method assumes that evolution has occured at a constant rate in the different lineages This means that a root of the tree is also estimated
196. right click a sequence in Navigation Area Toolbox Nucleotide Analysis Ea Reverse Complement x This opens the dialog displayed in figure 12 3 a q Reverse Complement Sequence eS al 1 Select nucleotide Ms Oe sequences Projects Selected Elements 1 3 CLC Data XX ATPSal mRNA Example Data Xx ATP8al genomic sec Cloning Cloning vector liti Enzyme lists Xc pcDNA3 atp8al xx pcDNA4_TO Processed data i Cloning expe gt Primers SS Protein analyses Protein orthologs RNA secondary strui Sequencing data Q lt enter search term gt A previous open Senh Xema Figure 12 3 Creating a reverse complement sequence If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish This will open a new view in the View Area displaying the reverse complement of the selected sequence The new sequence is not saved automatically To save the sequence drag it into the Navigation Area or press Ctrl S S on Mac to activate a save dialog CHAPTER 12 NUCLEOTIDE ANALYSES 127 12 4 Translation of DNA or RNA to protein In CLC Sequence Viewer you can translate a nucleotide sequence into a protein sequen
197. rkbench with people working with e g CLC Sequence Viewer They will be able to view the results of your analyses but not redo the analyses The CLC Workbenches and the CLC Sequence Viewer are developed for Windows Mac and Linux platforms Data can be exported imported between the different platforms in the same easy way as when exporting importing between two computers with e g Windows 1 5 When the program is installed Getting started CLC Sequence Viewer includes an extensive Help function which can be found in the Help menu of the program s Menu bar The Help can also be shown by pressing F1 The help topics are sorted in a table of contents and the topics can be searched We also recommend our Online presentations where a product specialist from CLC bio demon strates our software This is a very easy way to get started using the program Read more about online presentations here http clcbio com presentation 1 5 1 Quick start When the program opens for the first time the background of the workspace is visible In the background are three quick start shortcuts which will help you getting started These can be seen in figure 1 1 Figure 1 1 Three available Quick start short cuts available in the background of the workspace The function of the three quick start shortcuts is explained here e Import data Opens the Import dialog which you let you browse for and import data from your file system CHAPTER 1 INTRODUCTI
198. ron exon structure is not part of the algorithm Chapter 13 Restriction site analyses Contents 13 1 Dynamic restriction sites lt lt 0 131 13 1 1 Sort enzymes owe ew AR a A A A 133 13 1 2 Manage enzymes tank deca ee SERES Ee HES 134 13 2 Restriction site analysis from the Toolbox lt lt 0 lt lt 0 lt lt eee eae 135 13 2 1 Selecting sorting and filtering enzymes 136 13 2 2 Number of cut sites acid e a AE A 137 13 2 3 Output of restriction map analysis 138 13 2 4 Restriction sites as annotation on the sequence 139 13 2 5 Table of restriction sites cis se 28 eee a 139 13 3 Restriction enzyme lists 00 2 ee eee ee 140 13 3 1 Createenzyme Nee ia eae ae AR a a 140 13 3 2 View and modify enzyme list 2 0 02 eee eee a 141 There are two ways of finding and showing restriction sites e In many cases the dynamic restriction sites found in the Side Panel of sequence views will be useful since it is a quick and easy way of showing restriction sites e In the Toolbox you will find the other way of doing restriction site analyses This way provides more control of the analysis and gives you more output options e g a table of restriction sites and you can perform the same restriction map analysis on several sequences in one step This chapter first describes the dynamic restric
199. rs represented by the locations in the Navigation Area e Selecting all locations in the Navigation Area and export ES in zip format The resulting file will contain all the data stored in the Navigation Area and can be imported into CLC CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS f Sequence Viewer if you wish to restore from the back up at some point No matter which method is used for backup you may have to re define the locations in the Navigation Area if you restore your data from a computer breakdown 6 3 Export graphics to files CLC Sequence Viewer supports export of graphics into a number of formats This way the visible output of your work can easily be saved and used in presentations reports etc The Export Graphics function is found in the Toolbar CLC Sequence Viewer uses a WYSIWYG principle for graphics export What You See Is What You Get This means that you should use the options in the Side Panel to change how your data e g a sequence looks in the program When you export it the graphics file will look exactly the same way It is not possible to export graphics of elements directly from the Navigation Area They must first be opened in a view in order to be exported To export graphics of the contents of a view select tab of View Graphics This will display the dialog shown in figure 6 7 G Export Graphics eS 1 Output options RBS sais sis Export options O Export visible
200. ructure Description Link to database imported Rich format including all information All tables All tables and reports See http www clcbio com annotate with gff 3D structure 3D structure All tables All data in a textual format Selected files in CLC format Contained files folder structure APPENDIX D FORMATS FOR IMPORT AND EXPORT Note The Workbench can import external files too This means that all kinds of files can be imported and displayed in the Navigation Area but the above mentioned formats are the only ones whose contents can be shown in the Workbench D 2 List of graphics data formats Below is a list of formats for exporting graphics All data displayed in a graphical format can be exported using these formats Data represented in lists and tables can only be exported in pdf format see section 6 3 for further details Format Portable Network Graphics JPEG Tagged Image File PostScript Encapsulated PostScript Portable Document Format Scalable Vector Graphics Suffix png Jpg tif ps eps Pdf Svg Type bitmap bitmap bitmap vector graphics vector graphics vector graphics vector graphics Appendix E IUPAC codes for amino acids Single letter codes based on International Union of Pure and Applied Chemistry The information is gathered from http www ebi ac uk 2can tutorials aa html 1 5 APPENDIX E One letter abbreviation Mm TOmOooOo VU Z D
201. s all annotations that relate to the residues select sequence Toolbox in the Menu Bar General Sequence Analysis 17 Shuffle Sequence 2 or right click a sequence Toolbox General Sequence Analysis A Shuffle Sequence rae This opens the dialog displayed in figure 11 1 If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Click Next to determine how the shuffling should be performed In this step shown in figure 11 2 For nucleotides the following parameters can be set e Mononucleotide shuffling Shuffle method generating a sequence of the exact same mononucleotide frequency 114 CHAPTER 11 GENERAL SEQUENCE ANALYSES 115 a q Shuffle Sequence EA 1 Select one or more Seectoneormore sequences of same type SSS sequences of same type Projects Selected Elements 1 p CLC Data XC ATPBal mRNA Example Data 2 ATP8al genomic xx Sw ATP8al HI Cloning ES Primers H Protein analyses K Protein ortholog A RNA secondary Sequencing data gt fi EE 4 mI Q lt enter search term gt 4 Previous gt Next Finish x Cancel Figure 11 1 Choosing sequence for shuffling E BB Shuffle Sequence amp 3 1 Select one or more sequences of same type 2 Set parameters
202. s default the CLC Sequence Viewer opens one Workspace Additional Workspaces are created in the following way Workspace in the Menu Bar Create Workspace enter name of Workspace OK When the new Workspace is created the heading of the program frame displays the name of the new Workspace Initially the selected elements in the Navigation Area is collapsed and the View Area is empty and ready to work with See figure 3 16 3 5 2 Select Workspace When there is more than one Workspace in the CLC Sequence Viewer there are two ways to switch between them Workspace E in the Toolbar Select the Workspace to activate or Workspace in the Menu Bar Select Workspace E choose which Workspace to activate OK The name of the selected Workspace is shown after CLC Sequence Viewer at the top left corner of the main window in figure 3 16 it says default CHAPTER 3 USER INTERFACE 54 f CLC Free Workbench 4 0 Current workspace Default SEE File Edit Search view Toolbox Workspace Help E il pie led sh g OD E pipan mmi Io vy pea plete pt polo E Sa RI N Show New Export f Workspace Search Selection Zoom In Zoom Out ES Y CLC_Data 53 Example data ow Recycle bin 0 Quick start ES Alignments and Trees A General Sequence Analyses A Nucleotide Analyses a f Restriction Sites LOOKING FOR MORE FEATURES Sh Database Search Primer Design Cloning BLAST 3D Molecule View Processes
203. s in the background displayed in the Status bar it is possible to continue other tasks in the program Like the search process the download process can be stopped This is done in the Toolbox in the Processes tab 10 1 3 Save GenBank search parameters The search view can be saved either using dragging the search tab and and dropping it in the Navigation Area or by clicking Save E When saving the search only the parameters are saved not the results of the search This is useful if you have a special search that you perform from time to time Even if you don t save the search the next time you open the search view it will remember the parameters from the last time you did a search Chapter 11 General sequence analyses Contents 11 1 Shuffle sequence 00 eee eee 2 4 4 4 4 114 11 2 Sequence statistics lt lt 116 11 2 1 Bioinformatics explained Protein statistics 118 11 3 Join sequences 00 0 eee ee ee ee 4 4 4 4 122 CLC Sequence Viewer offers different kinds of sequence analyses which apply to both protein and DNA 11 1 Shuffle sequence In some cases it is beneficial to shuffle a sequence This is an option in the Toolbox menu under General Sequence Analyses It is normally used for statistical analyses e g when comparing an alignment score with the distribution of scores of shuffled sequences Shuffling a sequence remove
204. sert extra gaps select a part of the alignment right click the selection Add gaps before after If you have made a selection covering e g five residues a gap of five will be inserted In this way you can easily control the number of gaps to insert Gaps will be inserted in the sequences that you selected If you make a selection in two sequences in an alignment gaps will be inserted into these two sequences This means that these two sequences will be displaced compared to the other sequences in the alignment 14 3 3 Delete residues and gaps Residues or gaps can be deleted for individual sequences or for the whole alignment For individual sequences select the part of the sequence you want to delete right click the selection Edit Selection 1 Delete the text in the dialog Replace The selection shown in the dialog will be replaced by the text you enter If you delete the text the selection will be replaced by an empty text e deleted To delete entire columns select the part of the alignment you want to delete right click the selection Delete columns The selection may cover one or more sequences but the Delete columns function will always apply to the entire alignment 14 3 4 Move sequences up and down Sequences can be moved up and down in the alignment drag the name of the sequence up or down When you move the mouse pointer over the label the pointer will turn into a vertical arrow indicating that the s
205. shown Maximizing a view can be done in the following ways select view Ctrl M CHAPTER 3 USER INTERFACE 49 f CLC Free Workbench 4 0 Current workspace Default File Edit Search View Toolbox Workspace Help pa S Ba CR DP BP la sas amp A did a Sa J lI A Show New Import Expor Graphics Print Workspace Search Fit Width 10096 Selection Zoom In Zoom Out HEE protein align P68053 MHETGEEKA AMTAMWCcKEN MBENccEA lc Es P68225 AUTTENcCKEN WDENccEAnEc gt a P6886873 AMTABWGCKEN NDENccEA EG Sequence layout P68228 AWHCBWSKUK MBEMccEA BG Spacing P68231 MMH KN A HcBWsKUK MBENccEA Es AN Every 10 residues P68063 MHWTA ENTcENcKEN PABCGABABA 2 P68945 MHWTABEKO EllTcE NcKEN MalDccaBada O Nowrap Consensus MVHLTXEEKN AVTGLWGKVN VDEVGGEALG 3 Auto wrap irei CAL O rutas 6 P68046 REIEN UN PwTa REEDSsSEcDEs SPDAlMGNPK 59 V Numbers on sequences pesos REENUNPWTO REEDSECDES SPDANMCNPR so ae Pos225 REENNNPWTO REBESEcBEs sPBAMMcNPK 50 P68873 Follow selection P68228 P68231 F p6s063 REEMNNPwWTO REBASEGNES SPTAMMcNPM 59 Hide labels PERO gt Ar JTA 5 o MII melo Inbal Figure 3 13 A maximized view The function hides the Navigation Area and the Toolbox or select view View Maximize restore View or select view right click the tab View Maximize restore View 1 or double click the tab of view The following restores the size of the view Ctrl M or View Maximize restor
206. sidered when adjusting Page Setup CHAPTER 5 PRINTING 69 12 34 5 6 Figure 5 6 An example where Fit to pages horizontally is set to 2 and Fit to pages vertically is set to 3 5 2 1 Header and footer Click the Header Footer tab to edit the header and footer text By clicking in the text field for either Custom header text or Custom footer text you can access the auto formats for header footer text in Insert a caret position Click either Date View name or User name to include the auto format in the header footer text Click OK when you have adjusted the Page Setup The settings are saved so that you do not have to adjust them again next time you print You can also change the Page Setup from the File menu 5 3 Print preview The preview is shown in figure 5 7 a Preview CLC Main Workbench 4 0 Es E UW w tw Y Zoom 100 Figure 5 7 Print preview The Print preview window lets you see the layout of the pages that are printed Use the arrows in the toolbar to navigate between the pages Click Print lt 5 to show the print dialog which lets you choose e g which pages to print The Print preview window is for preview only the layout of the pages must be adjusted in the Page setup Chapter 6 Import export of data and graphics Contents 6 1 Standard import ses eee wee eS OS ee OR ee Oe ee os 70 Oiled EXteMel TGS o amp le 6 be dE dd a Se A 12 6 1 2 Import Vector NTI data
207. sorbance of cysteine tyrosine and tryptophan using the following equation Ext Protein count Cystine x Ext Cystine count T yr xExt Tyr count Trp Ext Trp where Ext is the extinction coefficient of amino acid in question At 280nm the extinction coefficients are Cys 120 Tyr 1280 and Trp 5690 This equation is only valid under the following conditions e pH 6 5 e 6 0 M guanidium hydrochloride CHAPTER 11 GENERAL SEQUENCE ANALYSES 120 Amino acid Mammalian Yeast E coli Ala A 4 4 hour gt 20 hours gt 10 hours Cys C 1 2 hours gt 20 hours gt 10 hours Asp D 1 1 hours 3 min gt 10 hours Glu E 1 hour 30 min gt 10 hours Phe F 1 1 hours 3 min 2 min Gly G 30 hours gt 20 hours gt 10 hours His H 3 5 hours 10 min gt 10 hours lle 1 20 hours 30 min gt 10 hours Lys K 1 3 hours 3 min 2 min Leu L 5 5 hours 3 min 2 min Met M 30 hours gt 20 hours gt 10 hours Asn N 1 4 hours 3 min gt 10 hours Pro P gt 20 hours gt 20 hours e Gin Q 0 8 hour 10 min gt 10 hours Arg R 1 hour 2 min 2 min Ser S 1 9 hours gt 20 hours gt 10 hours Thr T 7 2 hours gt 20 hours gt 10 hours Val V 100 hours gt 20 hours gt 10 hours Trp W 2 8 hours 3 min 2 min Tyr Y 2 8 hours 10 min 2 min Table 11 1 Estimated half life Half life of proteins where the N terminal residue is listed in the first column and the half life in the subsequent columns for mammals yeast and E coli e 0 02 M phosphate buffer
208. sorted by clicking the column headings i e Name Overhang Methylation or Popularity This is particularly useful if you wish to use enzymes which produce e g a 3 overhang In this case you can sort the list by clicking the Overhang column heading and all the enzymes producing 3 overhangs will be listed together for easy selection When looking for a specific enzyme it is easier to use the Filter If you wish to find e g Hindlll sites simply type Hindlll into the filter and the list of enzymes will shrink automatically to only include the Hindlll enzyme This can also be used to only show enzymes producing e g a 3 overhang as shown in figure 13 17 The CLC Sequence Viewer comes with a standard set of enzymes based on http www rebase neb com You can customize the enzyme database for your installation see section CHAPTER 13 RESTRICTION SITE ANALYSES 135 Restriction Site Analysis Select DNA RNA ER es to be considered in calculatio sequence s Enzyme list Enzymes to be considered PAR Se v Use existing enzyme list Popular enzymes v e in calculation omer p y Enzymes in Popular en Enzymes to be used Filter g Filter Name Overhang Methylat Popul Name Overhang Methyla Pop PstI tgca S N6 met te KpnI gtac 5 N met Sacl agct 5 S meth SphI catg ito Apal ggec 5 5 meth Ball nnn 5 N4 met Chal gate soto
209. st 140 new folder 39 workspace 53 CSV export graph data points 81 formatting of decimal numbers 75 csv file format 1 3 CSV file format 172 173 ct file format 1 3 Data formats bioinformatic 1 1 graphics 1 4 Data structure 38 Database GenBank 110 local 38 Db source 103 Delete element 42 residues and gaps in alignment 149 workspace 54 182 Description 103 batch edit 42 DGE 162 Digital gene expression 162 DIP detection 161 Dipeptide distribution 121 Discovery studio file format 172 DNA translation 127 DNAstrider file format 1 2 Dot plots 164 Double cutters 133 Double stranded DNA 91 Download and open search results GenBank 113 Download and save search results GenBank 113 Download of CLC Sequence Viewer 9 Drag and drop Navigation Area 39 search results GenBank 112 DS Gene file format 172 Edit alignments 148 163 annotations 162 enzymes 134 sequence 95 sequences 162 single bases 96 Element delete 42 rename 41 embl file format 1 3 Embl file format 1 2 Encapsulated PostScript export 9 End gap cost 145 End gap costs cheap end caps 145 free end gaps 145 Enzyme list 140 create 140 edit 141 view 141 eps format export 9 Error reports 13 Evolutionary relationship 152 Example data import 15 INDEX Excel export file format 1 3 Expand selection 95 Export bioinformatic data 5 dependent objects 76 folder 5 graph
210. st select above If you have not chosen to use an existing enzyme list this panel shows all the enzymes available e To the right there is a list of the enzymes that will be used Select enzymes in the left side panel and add them to the right panel by double clicking or clicking the Add button E gt If you e g wish to use EcoRV and BamHI select these two enzymes and add them to the right side panel If you wish to use all the enzymes in the list Click in the panel to the left press Ctrl A 38 A on Mac Add gt The enzymes can be sorted by clicking the column headings i e Name Overhang Methylation or Popularity This is particularly useful if you wish to use enzymes which produce e g a 3 overhang In this case you can sort the list by clicking the Overhang column heading and all the enzymes producing 3 overhangs will be listed together for easy selection When looking for a specific enzyme it is easier to use the Filter If you wish to find e g Hindlll sites simply type Hindlll into the filter and the list of enzymes will shrink automatically to only include the Hindlll enzyme This can also be used to only show enzymes producing e g a 3 overhang as shown in figure 13 17 The CLC Sequence Viewer comes with a standard set of enzymes based on http www rebase neb com CHAPTER 13 RESTRICTION SITE ANALYSES Restriction Site Analysis Select DNA RNA sequence s Enzyme list Use existing enzyme list
211. sts from the selected elements Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish This will open a new view in the View Area displaying the new DNA sequence The new sequence is not saved automatically To save the sequence drag it into the Navigation Area or press Ctrl CHAPTER 12 NUCLEOTIDE ANALYSES 126 S S on Mac to activate a save dialog Note You can select multiple RNA sequences and sequence lists at a time If the sequence list contains DNA sequences as well they will not be converted 12 3 Reverse complements of sequences CLC Sequence Viewer is able to create the reverse complement of a nucleotide sequence By doing that a new sequence is created which also has all the annotations reversed since they now occupy the opposite strand of their previous location To quickly obtain the reverse complement of a sequence or part of a sequence you may select a region on the negative strand and open itin a new view right click a selection on the negative strand Open selection in New View L By doing that the sequence will be reversed This is only possible when the double stranded view option is enabled It is possible to copy the selection and paste it in a word processing program or an e mail To obtain a reverse complement of an entire sequence select a sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analysis 4 Reverse Complement x or
212. t can be saved by clicking Save or by dragging the tab of the view into the Navigation Area Opening a Sequence list is done by right click the sequence list in the Navigation Area Show 42 Graphical Sequence List OR Table H The two different views of the same sequence list are shown in split screen in figure 9 13 9 7 1 Graphical view of sequence lists The graphical view of sequence lists is almost identical to the view of single sequences see section 9 1 The main difference is that you now can see more than one sequence in the same view However you also have a few extra options for sorting deleting and adding sequences e To add extra sequences to the list right click an empty white space in the view and select Add Sequences e To delete a sequence from the list right click the sequence s name and select Delete Sequence e To sort the sequences in the list right click the name of one of the sequences and select Sort Sequence List by Name or Sort Sequence List by Length e To rename a sequence right click the name of the sequence and select Rename Sequence 9 7 2 Sequence list table Each sequence in the table sequence list is displayed with CHAPTER 9 VIEWING AND EDITING SEQUENCES 107 EocooooY hk sequence list E 50 boa a PERHIBA 50 100 PE PH 185 e 50 100 E l PERH2BA 30 100 v FE 44 sequence list 4 Accession Definition Modification Date Length P man
213. the spreadsheet software are using the same Locale Note The Export dialog decides which types of files you are allowed to export into depending on what type of data you want to export E g protein sequences can be exported into GenBank Fasta Swiss Prot and CLC formats Export of folders and multiple elements The zip file type can be used to export all kinds of files and is therefore especially useful in these situations e Export of one or more folders including all underlying elements and folders e f you want to export two or more elements into one file Export of folders is similar to export of single files Exporting multiple files of different formats is done in zip format This is how you export a folder CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 16 select the folder to export Export choose where to export to enter name Save You can export multiple files of the same type into formats other than ZIP zip E g two DNA sequences can be exported in GenBank format select the two sequences by lt Ctrl gt click 38 click on Mac or lt Shift gt click Export E choose where to export to choose GenBank gbk format enter name the new file Save Export of dependent elements When exporting e g an alignment CLC Sequence Viewer can export the alignment including all the sequences that were used to create it This way when sending your alignment with the dependent sequences your colle
214. the alignment view into the Navigation Area CHAPTER 2 TUTORIALS 30 a MN Create Alignment 8 1 Select sequences of same Select sequences oF same type Projects Selected Elements 6 094296 P39524 P57792 Q29449 QONTIZ Q95x33 J CLC Data Example Data 290 ATP8al genomit 2 ATPSal mRNA Shs ATPSal Cloning 7 Primers 55 Protein analyse 5 Protein ortholog EE ATP8al orth eee He 223322 RNA secondary Sequencing dat j Qy zenter search term gt 4 ious gt Next Finish 2 Cancel Figure 2 13 The alignment dialog displaying the six protein sequences E EB Create Alignment EJ 1 Select sequences of same Ser paramere type 2 Set parameters Gap settings Gap open cost 10 Gap extension cost 1 End gap cost As any other w Alignment O Fast less accurate Slow very accurate Redo alignments Use Fixpoints A e Previous gt Next Y Erin 2 Cancel Figure 2 14 The alignment dialog displaying the available parameters which can be adjusted Installing the Additional Alignments plugin gives you access to other alignment algorithms ClustalW Windows Mac Linux Muscle Windows Mac Linux T Coffee Mac Linux MAFFT Mac Linux and Kalign Mac Linux The Additional Alignments Module can be downloaded from http ww clcbio com plugins Note that you will need administrative priv
215. the view settings for the views will be shown in the dialog Click Export and you will now be able to define a save folder and name for the exported file The settings are saved in a file with a vsf extension View Settings File To import a Side Panel settings file make sure you are at the bottom of the View panel of the Preferences dialog and click the Import button Note that there is also another import button at the very bottom of the dialog but this will import the other settings of the Preferences dialog see section 4 4 The dialog asks if you wish to overwrite existing Side Panel settings or if you wish to merge the CHAPTER 4 USER PREFERENCES AND SETTINGS 61 x q Select Settings To Export Non compact 4 No annotations No restriction sites XX Cancel Figure 4 6 Exporting all settings for circular views imported settings into the existing ones see figure 4 7 E veses oem How do you want to import o Merge into existing styles Overwrite existing styles dx X Cancel Figure 4 7 When you import settings you are asked if you wish to overwrite existing settings or if you wish to merge the new settings into the old ones Note If you choose to overwrite the existing settings you will loose all the Side Panel settings that you have previously saved To avoid confusion of the different import and export options here is an overview e Im
216. ting a search no files are downloaded Instead the program produces a list of links to the files in the NCBI database This ensures a much faster search 10 1 2 Handling of GenBank search results The search result is presented as a list of links to the files in the NCBI database The View displays 50 hits at a time This can be changed in the Preferences see chapter 4 More hits can be displayed by clicking the More button at the bottom right of the View Each sequence hit is represented by text in three columns e Accession Description e Modification date Length It is possible to exclude one or more of these columns by adjust the View preferences for the database search view Furthermore your changes in the View preferences can be saved See section 4 5 Several sequences can be selected and by clicking the buttons in the bottom of the search view you can do the following e Download and open doesn t save the sequence e Download and save lets you choose location for saving sequence e Open at NCBI searches the sequence at NCBI s web page Double clicking a hit will download and open the sequence The hits can also be copied into the View Area or the Navigation Area from the search results by drag and drop copy paste or by using the right click menu as described below Drag and drop from GenBank search results The sequences from the search results can be opened by dragging them into a position in the View
217. tion sites followed by the toolbox way The final section in this chapter focuses on enzyme lists which represent an easy way of managing restriction enzymes 13 1 Dynamic restriction sites If you open a sequence a sequence list etc you will find the Restriction Sites group in the Side Panel 131 CHAPTER 13 RESTRICTION SITE ANALYSES 132 As shown in figure 13 1 you can display restriction sites as colored triangles and lines on the sequence The Restriction sites group in the side panel shows a list of enzymes represented by different colors corresponding to the colors of the triangles on the sequence By selecting or deselecting the enzymes in the list you can specify which enzymes restriction sites should be displayed Restriction sites 4 Show Labels Stacked Sorting Aa LI V Non cutters Single cutters E y sami amp 0 M Ecor1 O Mem O EH A Hina 1 E BE rao Fido O DIM Double cutters MN Y sor a DO Smar 2 ED Multiple cutters aes EN F salt 3 5 Figure 13 1 Showing restriction sites of ten restriction enzymes ST TAGAGGGCCCGTTTAAACC The color of the restriction enzyme can be changed by clicking the colored box next to the enzyme s name The name of the enzyme can also be shown next to the restriction site by selecting Show name flags above the list of restriction enzymes There is also an option to specify how the Labels shown be shown e No labels This w
218. tions will be shown If the graph is covering a set of aligned sequences with a main sequence such as read mappings and BLAST results the dialog shown in figure 6 15 will be displayed These kinds of graphs are located under Alignment info in the Side Panel In all other cases a normal file dialog will be shown letting you specify name and location for the file In this dialog select whether you wish to include positions where the main sequence the CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 82 E NC_000003 ACCATTCGATGATTGCATTCAATTCATTCGATGACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC Consensus iACCATTCGATGATTGCATTCAATTCATTCGATGACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC 3388 Coverage iid 82 8 xt ihe A RA AAA em Gees aa 8 1205 1326 1 TGACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC 1 2 413 1273 2 TGACGATTCCATTCAATTCCGTTCAATGATTCCATTEGATTC 98 1139 847 1 GACGATTCCATTCAATTCCGTTCAATGATTCCATTMGATTC 2 90 40 189 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTMGATTC 86 627 1969 1 GACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC 1 85 523 514 2 GACGATTCCATGCAATTCCGTTCAATGATTCCATTAGATTC 4 1256 1139 1 GACCATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC 78 1008 834 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC 64 294 1084 2 GACGATTCCATTCAMTTCCGTTCAATGATTCCATTIMGATTC 58 722 1303 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC Figure 6 14 A graph displayed along the mapped reads Right click the graph to export the data points to a file g Export Graphics 1 Output opt
219. to be unrelated or to use heuristic corrections for shared ancestry The second challenge is to find the optimal alignment given a scoring function For pairs of sequences this can be done by dynamic programming algorithms but for more than three sequences this approach demands too much computer time and memory to be feasible A commonly used approach is therefore to do progressive alignment Feng and Doolittle 1987 where multiple alignments are built through the successive construction of pairwise alignments These algorithms provide a good compromise between time spent and the quality of the resulting alignment Presently the most exciting development in multiple alignment methodology is the construction of statistical alignment algorithms Hein 2001 Hein et al 2000 These algorithms employ a scoring function which incorporates the underlying phylogeny and use an explicit stochastic model of molecular evolution which makes it possible to compare different solutions in a statistically rigorous way The optimization step however still relies on dynamic programming and practical use of these algorithms thus awaits further developments Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its origi
220. tral to working with CLC Sequence Viewer because several operations can be performed by dragging the tab of a view and extended right click menus can be activated from the tabs This chapter deals with the handling of views inside a View Area Furthermore it deals with rearranging the views Section 3 3 deals with the zooming and selecting functions 3 2 1 Open view Opening a view can be done in a number of ways double click an element in the Navigation Area or select an element in the Navigation Area File Show Select the desired way to view the element or select an element in the Navigation Area Ctrl O 36 B on Mac Opening a view while another view is already open will show the new view in front of the other view The view that was already open can be brought to front by clicking its tab Note If you right click an open tab of any element click Show and then choose a different view of the same element this new view is automatically opened in a split view allowing you to see both views CHAPTER 3 USER INTERFACE 45 See section 3 1 5 for instructions on how to open a view using drag and drop 3 2 2 Show element in another view Each element can be shown in different ways A sequence for example can be shown as linear circular text etc In the following example you want to see a sequence in a circular view If the sequence is already open in a view you can change the view to a circular view Click Sho
221. ts 42 size of view 49 Restriction enzmyes filter 134 136 141 from certain suppliers 134 136 141 Restriction enzyme list 140 Restriction enzyme star activity 140 Restriction enzymes methylation 134 136 141 number of cut sites 133 overhang 134 136 141 sorting 133 Restriction sites 103 enzyme database Rebase 140 select fragment 95 number of 137 on sequence 92 131 parameters 135 tutorial 32 Results handling 86 Reverse complement 126 163 Reverse translation 163 Right click on Mac 19 RNA secondary structure 164 RNA translation 127 RNA Seq analysis 161 rnaml file format 1 3 Safe mode 13 Save changes in a view 46 sequence 29 style sheet 62 view preferences 62 workspace 53 Save enzyme list 134 SCF2 file format 1 2 SCF3 file format 1 2 Scroll wheel to zoom in 50 to zoom out 50 Search GenBank 110 GenBank file 104 handle results from GenBank 112 hits number of 58 in a sequence 93 in annotations 93 186 local data 161 options GenBank 110 parameters 110 Secondary structure predict RNA 164 Secondary structure prediction 163 Select exact positions 93 in sequence 94 parts of a sequence 94 workspace 53 Select annotation 95 Selection mode in the toolbar 51 Selection adjust 95 Selection expand 95 Selection location on sequence 51 Sequence alignment 143 analysis 114 display different information 41 extract from sequence list 107 find 93 inform
222. u are asked to fill in the Download dialog In the dialog you must choose e Which operating system you use e Whether you would like to receive information about future releases Depending on your operating system and your Internet browser you are taken through some download options When the download of the installer an application which facilitates the installation of the program is complete follow the platform specific instructions below to complete the installation procedure 1 2 2 Installation on Microsoft Windows Starting the installation process is done in one of the following ways If you have downloaded an installer Locate the downloaded installer and double click the icon CHAPTER 1 INTRODUCTION TO CLC SEQUENCE VIEWER 10 The default location for downloaded files is your desktop If you are installing from a CD Insert the CD into your CD ROM drive Choose the Install CLC Sequence Viewer from the menu displayed Installing the program is done in the following steps e On the welcome screen click Next e Read and accept the License agreement and click Next e Choose where you would like to install the application and click Next e Choose a name for the Start Menu folder used to launch CLC Sequence Viewer and click Next e Choose if CLC Sequence Viewer should be used to open CLC files and click Next e Choose where you would like to create shortcuts for launching CLC Sequence Viewer and click Next e
223. uence If you chose to add the restriction sites as annotation to the sequence the result will be similar to the sequence shown in figure 13 14 See section 9 3 for more information about viewing 5 acll 1 Ei ATPsal MRNA GGTGGGAGGCGCGGCCCCGCGGCAGCTGAGCCC Figure 13 14 The result of the restriction analysis shown as annotations annotations 13 2 5 Table of restriction sites The restriction map can be shown as a table of restriction sites see figure 13 15 Restriction m 2 Rows 5 Restriction sites table Fiter 0 Mame Pattern Owerhang Number Cut position s CjePI ccannnnnnntc 3 151 184 PERHE Mo e ON PERH BC cul ama Po ho A PERHOBC io fara o um PERHGEC hill saca o ho foi AAA AAA A a de gt Figure 13 15 The result of the restriction analysis shown as annotations Each row in the table represents a restriction enzyme The following information is available for each enzyme Sequence The name of the sequence which is relevant if you have performed restriction map analysis on more than one sequence Name The name of the enzyme e Pattern The recognition sequence of the enzyme Overhang The overhang produced by cutting with the enzyme 3 5 or Blunt e Number of cut sites CHAPTER 13 RESTRICTION SITE ANALYSES 140 e Cut position s The position of each cut Ifthe enzyme cuts more than once the positions are separated by co
224. ues bold 9 1 2 Restriction sites in the Side Panel Please see section 13 1 9 1 3 Selecting parts of the sequence You can select parts of a sequence Click Selection Ch in Toolbar Press and hold down the mouse button on the sequence where you want the selection to start move the mouse to the end of the selection while holding the button release the mouse button CHAPTER 9 VIEWING AND EDITING SEQUENCES 95 Alternatively you can search for a specific interval using the find function described above If you have made a selection and wish to adjust it drag the edge of the selection you can see the mouse cursor change to a horizontal arrow or press and hold the Shift key while using the right and left arrow keys to adjust the right side of the selection If you wish to select the entire sequence double click the sequence name to the left Selecting several parts at the same time multiselect You can select several parts of sequence by holding down the Ctrl button while making selections Holding down the Shift button lets you extend or reduce an existing selection to the position you clicked To select a part of a sequence covered by an annotation right click the annotation Select annotation or double click the annotation To select a fragment between two restriction sites that are shown on the sequence double click the sequence between the two restriction sites Read more about restriction sites in section 9 1 2
225. ume processes there are some extra options for a selected number of the tools running from the Toolbox e Show results If you have chosen to save the results see section 8 1 you will be able to open the results directly from the process by clicking this option e Find results If you have chosen to save the results see section 8 1 you will be able to high light the results in the Navigation Area e Show Log Information This will display a log file showing progress of the process The log file can also be shown by clicking Show Log in the handle results dialog where you choose between saving and opening the results e Show Messages Some analyses will give you a message when processing your data The messages are the black dialogs shown in the lower left corner of the Workbench that disappear after a few seconds You can reiterate the messages that have been shown by clicking this option The terminated processes can be removed by View Remove Terminated Processes 34 If you close the program while there are running processes a dialog will ask if you are sure that you want to close the program Closing the program will stop the process and it cannot be restarted when you open the program again 3 4 2 Toolbox The content of the Toolbox tab in the Toolbox corresponds to Toolbox in the Menu Bar The Toolbox can be hidden so that the Navigation Area is enlarged and thereby displays more elements View Show Hide Toolbox
226. ur own information to sequences that do not derive from databases Note that for other kinds of data the Element info will only have Name and Description 9 5 View as text A sequence can be viewed as text without any layout and text formatting This displays all the information about the sequence in the GenBank file format To view a sequence as text select a sequence in the Navigation Area Show in the Toolbar As text This way it is possible to see background information about e g the authors and the origin of DNA and protein sequences Selections or the entire text of the Sequence Text View can be copied and pasted into other programs Much of the information is also displayed in the Sequence info where it is easier to get an overview see section 9 4 In the Side Panel you find a search field for searching the text in the view 9 6 Creating a new sequence A sequence can either be imported downloaded from an online database or created in the CLC Sequence Viewer This section explains how to create a new sequence New 5 in the toolbar The Create Sequence dialog figure 9 11 reflects the information needed in the GenBank format but you are free to enter anything into the fields The following description is a guideline for entering information about a sequence e Name The name of the sequence This is used for saving the sequence e Common name A common name for the species CHAPTER 9 VIEWING AND EDITING SEQUE
227. via the Navigation Area to be included in an alignment If you have sequences open in a View that you have not saved then you just need to select the view tab and press Ctrl S or S on Mac to save them In this tutorial six protein sequences from the Example data folder will be aligned See figure 2 12 gt Example data 2 ATP8al genomic sequence XxX ATP8al mRNA Sw ATPSal Figure 2 12 Six protein sequences in Sequences from the Protein orthologs folder of the Example data To align the sequences select the sequences from the Protein folder under Sequences Toolbox P Alignments and Trees Create Alignment 2 5 1 The alignment dialog This opens the dialog shown in figure 2 13 It is possible to add and remove sequences from Selected Elements list Since we had already selected the eight proteins just click Next to adjust parameters for the alignment Clicking Next opens the dialog shown in figure 2 14 Leave the parameters at their default settings An explanation of the parameters can be found by clicking the help button Alternatively a tooltip is displayed by holding the mouse cursor on the parameters Click Finish to start the alignment process which is shown in the Toolbox under the Processes tab When the program is finished calculating it displays the alignment see fig 2 15 Note The new alignment is not saved automatically To save the alignment drag the tab of
228. w As Circular at the lower left part of the view The buttons used for switching views are shown in figure 3 7 ha O E WE Figure 3 7 The buttons shown at the bottom of a view of a nucleotide sequence You can click the buttons to change the view to e g a circular view or a history view If the sequence is already open in a linear view ur and you wish to see both a circular and a linear view you can split the views very easily Press Ctrl 38 on Mac while you Click Show As Circular at the lower left part of the view This will open a split view with a linear view at the bottom and a circular view at the top see 9 5 You can also show a circular view of a sequence without opening the sequence first Select the sequence in the Navigation Area Show As Circular Q 3 2 3 Close views When a view is closed the View Area remains open as long as there is at least one open view A view is closed by right click the tab of the View Close or select the view Ctrl W or hold down the Ctrl button Click the tab of the view while the button is pressed By right clicking a tab the following close options exist See figure 3 8 e Close See above e Close Tab Area Closes all tabs in the tab area e Close All Views Closes all tabs in all tab areas Leaves an empty workspace e Close Other Tabs Closes all other tabs in all tab areas except the one that is selected CHAPTER 3 USER INTERFACE 46
229. wing annotations However they all have two groups in the Side Panel in common e Annotation Layout e Annotation Types CHAPTER 9 VIEWING AND EDITING SEQUENCES 100 UL AEL Tj r k Sequence layout Annotation layout 4 Show annotations Position Next to sequence w Offset Little offset Label Stacked 4 Show arrows Use gradients Annotation types 7 CDs 7 Exon C Gene source hiss Seeetal Deselect All Restriction sites Residue coloring Nucleotide info k Find k Text Format Figure 9 7 Changing the layout of annotations in the Side Panel The two groups are shown in figure 9 7 In the Annotation layout group you can specify how the annotations should be displayed notice that there are some minor differences between the different sequence views e Show annotations Determines whether the annotations are shown e Position On sequence The annotations are placed on the sequence The residues are visible through the annotations if you have zoomed in to 100 Next to sequence The annotations are placed above the sequence Separate layer The annotations are placed above the sequence and above restriction sites only applicable for nucleotide sequences e Offset If several annotations cover the same part of a sequence they can be spread out Piled The annotations are piled on top of each other
230. wn menu Here you can choose between standard and topology layout The topology layout can help to give an overview of the tree if some of the branches are very short When the sequences include the appropriate annotation it is possible to choose between the accession number and the species names at the leaves of the tree Sequences downloaded from GenBank for example have this information The Labels preferences allows these different node annotations as well as different annotation on the branches CHAPTER 2 TUTORIALS 32 The branch annotation includes the bootstrap value if this was selected when the tree was calculated It is also possible to annotate the branches with their lengths 2 7 Tutorial Find restriction sites This tutorial will show you how to find restriction sites and annotate them on a sequence There are two ways of finding and showing restriction sites In many cases the dynamic restriction sites found in the Side Panel of sequence views will be useful since it is a quick and easy way of showing restriction sites In the Toolbox you will find the other way of doing restriction site analyses This way provides more control of the analysis and gives you more output options e g a table of restriction sites and a list of restriction enzymes that can be saved for later use In this tutorial the first section describes how to use the Side Panel to show restriction sites whereas the second section describes the restriction
231. ymes New enzyme list Filter Filter Name Overhang Methylation Popularity Name Overhang Methylation Popularity HindIII 5 agct N6 methyl a EcoRV Blunt N6 methyl Smal Blunt N4 methy l eee EcoRI 5 aatt N6 methyl et Xbal 5 ctag N6 methyl Smal Blunt N4 methyl Sall 5 tega N6 methyl SalI 5 tega N6 methyl 7 EcoRV Blunt N6 methyl tee PstI 3 tgca N6 methyl eee EcoRI S aatt N6 methyl BglII 5 gate N4 methyl Peer Xhol 5 toga N6 methyl PstI 3 tgca N6 methyl eee HindIII 5 agct N6 methyl t BamHI 5 gate N4 methy l ter BamHI 5 gate N4 methyl KpnI 3 gtac N6 methyl NcoI 5 catg N4 methyl NotI 5 gacc N4 methyl SacI 3 agct 5 methyle tee Ncol 5 catg N4 methyl eee KpnI 3 gtac N6 methyl i SacI 3 agct S methylc ee NotI 5 ggcc N4 methyl ee NdeI S ta N6 methyl er 2 lat E ca ME mastbhul oook l e E ul g 2 a E 2 Zz ol ES wf ok SX Cancel m Figure 13 16 Choosing enzymes for the new enzyme list At the top you can choose to Use existing enzyme list Clicking this option lets you select an enzyme list which is stored in the Navigation Area See section 13 3 for more about creating and modifying enzyme lists 3You can customize the enzyme database for your installation see section 4You can customiz

CLC Sequence Viewer

Contents

Download Pdf Manuals

Related Search

Related Contents