Home

Untitled - Lightwave Scientific

1. A BLAST Layout Gil56749858 sp P68873 HBB_PANTR Bo citer sequences at top gil122713 sp P02042 HBD_HUMAN gil 122726 sp P02100 HBE_HUMAN gil56749861 sp P 69892 HBG2 HUMAN Qill gt sp P091 05 HBAT_HUMAN Hemoglobin theta 1 subunit Hemoglobin theta 1 chain Theta 1 globin Score 90 9 bits 224 Expect 2E 19 Identities 53 145 37 Gaps 8 145 6 40 100 a Strand PlusPlus Y ES NP_058652 BLAST Summary of hits from query NP_058652 Number of hits 19 Query HE Description E value Score Hitstart Hit end Query start Query end Identity al NP_0S8652 _ P02042 He NP_osees2 Pozi00 H NP_058652 69892 Np_058652 P69891 S66E64 1 37028E 61 587 0 1 8 31322E 59 563 0 1 147 2581E 58 557 0 1 Fee 5 a _ Download and Open Download and Save _openatncer Figure 2 20 Output of a BLAST search By holding the mouse pointer over the lines you can get information about the sequence 2 9 Tutorial Primer design In this tutorial you will see how to use CLC Gene Workbench for finding primers for PCR amplification of a specific region The pBR322 sequence from the Example data is used in this tutorial On positions 1891 1892 and 1913 1914 there are two conflict annotations which mark the region that should be amplified Firs
2. Z will match the characters A through Z Range You can also put single characters between the brackets The expression AGT matches the characters A G or T A D M P will match the characters A through D and M through P Union You can also put single characters between the brackets The expression AG M P matches the characters A G and M through P CHAPTER 12 GENERAL SEQUENCE ANALYSES 158 A M amp amp H P will match the characters between A and M lying between H and P Intersection You can also put single characters between the brackets The expression A M amp amp HGTDA matches the characters A through M which is H G T D or A A M will match any character except those between A and M Excluding You can also put single characters between the brackets The expression AG matches any character except A and G A Z amp amp M P will match any character A through Z except those between M and P Subtraction You can also put single characters between the brackets The expression A P amp amp CG matches any character between A and P except C and G The symbol matches any character X n will match a repetition of an element indicated by following that element with a numerical value or a numerical range between the curly brackets For example ACG 2 matches the string ACGACG X n m will match a certain number of repetitions of an element indicated by follow
3. C and G nucleotides in the primer within which primers must lie by setting a maximum and a minimum GC content CHAPTER 15 PRIMERS 188 Melting temperature Determines the temperature interval within which primers must lie When the Nested PCR or TaqMan reaction type is chosen the first pair of melting temperature interval settings relate to the outer primer pair i e not the probe The melting temperature group can also be unfolded to show parameters regarding the reaction mixture Primer concentration Specifies the concentration of primers and probes in units of nanomoles nM Salt concentration Specifies the concentration of monovalent cations N AT K and equivalents in units of millimoles mM Melting temperatures are calculated by a nearest neighbor model which considers stacking interactions between neighboring bases in the primer template complex The model uses state of the art thermodynamic parameters SantaLucia 1998 and considers the important contribution from the dangling ends that are present when a short primer anneals to a template sequence Bommarito et al 2000 Inner melting temperature This option is only activated when the Nested PCR or TaqMan mode is selected In Nested PCR mode it determines the allowed melting temperature interval for the inner nested pair of primers and in TaqMan mode it determines the allowed temperature interval for the TaqMan probe Self annealing Determines the maximu
4. CHAPTER 11 VIEWING AND EDITING SEQUENCES 123 e Annotation search Searches the annotations on the sequence The search is performed both on the labels of the annotations but also on the text appearing in the tooltip that you see when you keep the mouse cursor fixed If the search term is found the part of the sequence corresponding to the matching annotation is selected e Position search Finds a specific position on the sequence In order to find an interval e g from position 500 to 570 enter 500 570 in the search field This will make a selection from position 500 to 570 both included Notice the two periods between the start an end number e Include negative strand When searching the sequence for nucleotides or amino acids you can search on both strands This concludes the description of the View Preferences Next the options for selecting and editing sequences are described Text format These preferences allow you to adjust the format of all the text in the view both residue letters sequence label and translations if relevant e Text size Five different sizes e Font Shows a list of Fonts available on your computer e Bold residues Makes the residues bold 11 1 2 Selecting parts of the sequence You can select parts of a sequence Click Selection Ox in Toolbar Press and hold down the mouse button on the sequence where you want the selection to start move the mouse to the end of the selection wh
5. CLC Gene Workbench 2 0 lets you convert a DNA sequence into RNA substituting the T residues Thymine for U residues Urasil select a DNA sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analyses 4 Convert DNA to RNA 2 or right click a sequence in Navigation Area Toolbox Nucleotide Analyses A Convert DNA to RNA 2 This opens the dialog displayed in figure 13 1 If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish Notice You can select multiple DNA sequences and sequence lists at a time If the sequence list contains RNA sequences as well they will not be converted 163 CHAPTER 13 NUCLEOTIDE ANALYSES 164 9 Convert DNA to RNA 1 Select DNA sequences Projects Selected Elements L Example data ac PERH3BC Nucleotide S ER Sequences e 20 PERH2BD 20 HUMDINUC 2 sequence list w Assembly w Cloning project 9 Primer design E E Restriction analysis W E Protein E E Exa 8 9 Performed analyses E README CLC bio Home Figure 13 1 Translating DNA to RNA 13 2 Convert RNA to DNA CLC Gene Workbench 2 0 lets you convert an RNA sequence into DNA substituting the U residu
6. and Trees Create Alignment This opens the dialog shown in figure 18 1 Create Alignment 1 Select sequences or alignments of same type Projects Selected Elements LL Example data Ss P68046 E E Nucleotide Ss P68053 E E3 Protein su P68063 E 3D structures Sequences Pe CAA24102 Ss CAA32220 As NP_058652 kad Ne Pu P68225 Pu P68228 us P68231 Ss P68873 He P68945 us 1A29_HUMAN Extra 4 7 Performed analyses E README G CLC bio Home Figure 18 1 Creating an alignment If you have selected some elements before choosing the Toolbox action they are now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences sequence lists or alignments from the Project Tree Click Next to adjust alignment algorithm parameters Clicking Next opens the dialog shown in figure 18 2 Y Create Alignment 1 Select sequences or alignments of same type 2 Set parameters Gap settings Gap open cost 10 0 Gap extension cost 1 0 End gap cost As any other w 4 Fast alignment Ka Xe Figure 18 2 Adjusting alignment algorithm parameters CHAPTER 18 SEQUENCE ALIGNMENT 242 18 1 1 Gap costs The alignment algorithm has three parameters concerning gap costs Gap open cost Gap extension cost and End gap cost The precision of these parameters is to one place of decimal e Gap open cost The price for introducing gaps in an alig
7. l PERH3BC GGCTTACCTT CCTATCAGAA GGAAATGGGA AGAGATTCTA GGGAG 1 Tth 15 PERH3BC CAGTTTAGAT GGAAGGTATC TGCTTGTTCC CCCATGGAGT GCTGA 140 Cie PERH3BC CAAGAGTTTG GTTATTTTAC TCTCCACTCA CAATCATCAT GTCCT ES PERHSBC restr Ex Name Pattern Overhang Number of matches Cut position s CjePI ccannnnnnnte le i 151 184 Mboll gaaga 3 ji 186 Tthil 1111 z caarca E i ae 01 Figure 17 16 The result of the restriction site detection is displayed as text and in this example the View Shares the View Area with a View of the PERH3BC sequence displaying the restriction sites split screen view This list may be very long and hence it might not be possible for CLC Gene Workbench to display all cut positions in one cell If you want to see the entire list of cut positions select the table line with the relevant enzyme Ctrl C C on Mac open a word processing program Ctrl V 36 V on Mac 17 4 Restriction enzyme lists CLC Gene Workbench includes all the restriction enzymes available in the REBASE database However when performing restriction site analyses it is often an advantage to use a customized list of enzymes In this the user can create special lists containing e g all enzymes available in the laboratory freezer all enzymes used to create a given restriction map or all enzymes that are available form the preferred vendor This section describes how you can create an enzyme list
8. select a nucleotide sequence Toolbox in the Menu Bar Nucleotide Analyses lt A Annotate with SNPs from NCBI JA or right click a nucleotide sequence Toolbox Nucleotide Analyses A Annotate with SNPs from NCBI JA If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree When you have selected the desired sequences click finish Warning annotating a sequence with SNPs may take a while depending on sequence size and the availability of the NCBI server Furthermore only sequences which have a GenBank accession number can be annotated with SNPs CHAPTER 13 NUCLEOTIDE ANALYSES 168 After executing the action you will see the sequence annotated with the SNPs as in figure 13 6 1760 l NM_000044 GGAGAGCGAGGGAGGCCTCGGGG Figure 13 6 A sequence annotated with SNP s 13 6 Find open reading frames CLC Gene Workbench 2 0 has a basic functionality for gene finding in the form of open reading frame ORF determination The ORFs will be shown as annotations on the sequence You have the option of choosing translation table start codons minimum length and other parameters for finding the ORFs These parameters will be explained in this section To find open reading frames select a nucleotide sequence Toolbox in the Menu Bar Nucleotide Analyses A Find Ope
9. 4 EG i My Computer My Recent a My Network Places Documents Desktop My Documents My Computer E File name My Network Places Files of type Portable Network Graphics png Figure 6 4 Exporting a phylogenetic tree from CLC Gene Workbench 2 0 To see the exported file browse to the file on your computer and open it In our case the png file is opened in a browser the result can be seen in figure 6 5 CAA24102 align_tree png PNG Billede 1266x1296 pixler Mozilla Firefox Filer Rediger wis GSti Bogm rker Funktioner Hj lp 96 NP_03224 NP_ 05865 P68228 P68231 P68046 P68053 Figure 6 5 The exported png file opened in a browser Due to high resolution of the exported graphics it is not possible to see the entire file in the browser window The following file types are available for exporting graphics in CLC Gene Workbench 2 0 Bitmap images In a bitmap image each dot in the image has a specified color This implies that if you zoom in on the image there will not be enough dots and if you zoom out there will be too many In these cases the image viewer has to interpolate the colors to fit what is actually looked at This format is a good choice for storing images without large shapes e g dot plots Vector graphics Vector graphics is a collection of shapes Thus what is stored is e g information about where a line starts and ends and the color of the line and its width This en
10. CLC Gene Workbench User manual User manual for CLC Gene Workbench 2 0 Windows Mac OS X and Linux July 6 2006 CLC bio Gustav Wieds Vej 10 Dk 8000 Aarhus C o Denmark omus LC bio Contents 1 2 Introduction Introduction to CLC Gene Workbench 1 1 Contact i A eo we Se en Eb Ri es WE ae ee ew AP A Be ee Sle Uh oe BE ME Me Ga 1 2 Download and installation 1 3 System requirements e es 1 4 Licenses 1 5 About CLC Workbenches 0 0 00 2 eee ee ee ee 1 6 When the program is installed Getting started 1 7 Network CONMBUIGUOR os ss bee he bebe oe ee ee ae Re ed 1 8 Adjusting the maximum amount of memory 0 008502 eee 1 9 The form Tutorials 2 1 Tutorial 2 2 Tutorial 2 3 Tutorial 2 4 Tutorial 2 5 Tutorial 2 6 Tutorial 2 7 Tutorial 2 8 Tutorial 2 9 Tutorial 2 10 Tutorial 2 11 Tips and at of the User Manual Starting up the program co ee ee ee eR ES a a e a a i osp gece yee eee de oa beak oe ete ec de he Fat on oe aes GenBank search and download o e Align protein SEQUENCES s e c e s ea tra EE a ae Create and modify a phylogenetic tree DEISCELTESINCHOMSNCS cue ia a e Bob ee ea bce ia BS Sequence information 2 2 2 0 00 ce eee eee ees BLAST SEAGE sss 6 eee deg ab a do Syed ee cee Se ee ee OS Primerde
11. Mispriming parameters Use mispriming as exclusion criteria Y Calculate 9 Help Figure 15 8 Calculation dialog for PCR primers Again the top part of this dialog shows the parameter settings chosen in the Primer parameters preference group which will be used by the design algorithm The lower part again contains a CHAPTER 15 PRIMERS 194 menu where the user can choose to include mispriming of both primers as a criteria in the design process see above The central part of the dialogue contains parameters pertaining to primer pairs Here three parameters can be set Maximum percentage point difference in G C content if this is set at e g 5 points a pair of primers with 45 and 49 G C nucleotides respectively will be allowed whereas a pair of primers with 45 and 51 G C nucleotides respectively will not be included Maximal difference in melting temperature of primers in a pair the number of degrees Celsius that primers in a pair are all allowed to differ Maximum pair annealing score the maximum number of hydrogen bonds allowed between the forward and the reverse primer in a primer pair 15 5 2 Standard PCR output table If only a single region is selected the following columns of information are available Sequence the primer s sequence Penalty measures how much the properties of the primer or primer pair deviates from the optimal solution in terms of the chosen parameters The lower the penalty
12. Moving the slider from the right to the left lowers the thresholds which can be directly seen in the dot plot where more diagonal lines will emerge You can also choose another color gradient by clicking on the gradient box and choose from the list Adjusting the sliders above the gradient box is also practical when producing an output for printing Too much background color might not be desirable By crossing one slider over the other the two sliders change side the colors are inverted allowing for a white background If you choose a color gradient which includes white Se figure 12 3 CHAPTER 12 GENERAL SEQUENCE ANALYSES 139 c i QEWNZ1 vs Q5WNZ0 Q6WN21 vs QEWN20 140 120 100 Q6WN21 mi y DotPlot Preferences C Lock axes Modify Gradient min max Text format Text size Medium Y Font SansSerif Y Z Bold T T T T 20 40 60 80 Q6WN20 T T T 100 120 140 Figure 12 3 A view is opened showing the dot plot P68053 vs PEBOSI Sequence 2 T T T T BO so 100 no 120 Sequence 120 140 Figure 12 4 Dot plot with inverted colors practical for printing 12 1 3 Bioinformatics explained Dot plots Realization of dot plots Dot plots are two dimensional plots where the x axis and y axis each represents a sequence and the plot itself shows a comparison of these two sequences by a calculated score for each position of the sequence If a window of fixed size on
13. Open Selection in New View Set Numbers Relative to This Selection Edit Selection Add Annotation Trim sequence left Trim sequence right Set Alignment Fixpoint Here Figure 2 23 Right clicking a selection and choosing Forward primer region here 2 9 3 Examining the primer suggestions When you have specified this region you will be able to see five lines of possible primers based on the Primer parameters to the left Each line represents primers of a specific length e g the first line represents primers of length 18 see figure 2 24 1860 1880 l onflict A AATCC Lgt 18 eceeeceeee coessssssso 00 Lgt 19 eeceeeceees eeeeeeeeeo o Lgt 20 eeeeeceeee eeeeeeeeeo Lgt 21 eeeeeeeeee eeeeeeeee Lgt 22 eeeeeeeees eeeeeees Figure 2 24 Five lines of dots representing primer suggestions There is a line for each length Each line consists of a number of dots each representing a possible primer E g the first dot on the first line primers of length 18 represents a primer starting at the dot s position and with a length of 18 nucleotides Shown as the white area in figure 2 25 Position the mouse cursor upon a dot and you will see an information box providing data about this primer Clicking the dot will select the region where the primer will anneal See figure 2 26 Note that the dot is colored red and that there is an asterisk before the melting temperature CHAPTER 2 TUTORIALS 43 Ss U P
14. Paste 74 or select the files to copy Ctrl C 36 C on Mac select where to insert files Ctrl P 3 P on Mac or select the files to copy Edit in the Menu Bar Copy select where to insert files Edit in the Menu Bar Paste CHAPTER 3 USER INTERFACE 61 If there is already an element of that name the pasted element will be renamed by appending a number at the end of the name Elements can also be moved instead of copied This is done with the cut paste function select the files to cut right click one of the selected files Cut o right click the location to insert files into Paste 15 or select the files to cut Ctrl X 38 X on Mac select where to insert files Ctrl V 36 V on Mac When you have cut the element it disappears until you activate the paste function Move using drag and drop Using drag and drop in the Navigation Area as well as in general is a four step process click the element click on the element again and hold left mouse button drag the element to the desired location let go of mouse button This allows you to e Move elements between different projects and folders in the Project Tree e Drag from the Navigation Area to the View Area A new View is opened in an existing View Area if the element is dragged from the Navigation Area and dropped next to the tab s in that View Area e Drag from the View Area to the Navigation Area The element e
15. The CLC Gene Workbench 2 0 offers a lot of possibilities to handle bioinformatic data Read the next sections to get information on how to import different file formats or to import data from a Vector NTI database Import of common bioinformatic data Before importing a file you must decide where you want to import it i e which project or folder The imported file ends up in the project or folder you selected in the Navigation Area select project or folder click Import in the Toolbar browse to the relevant file Select The imported file is placed at the location which was selected when the import was initiated E g if you right click on a file in the Navigation Area and choose import the imported file is placed CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 87 immediately below the selected file If you right click a folder the imported file is placed as the last file in that folder If you right click a project the imported file is placed as the last file in that project and after existing folders It is also possible to drag a file from e g the desktop into the Navigation Area of CLC Gene Workbench If CLC Gene Workbench recognizes the file format the file is automatically parsed changed into CLC format and stored in the Navigation Area If the format is not recognized the following dialog is displayed see figure 6 1 9 Import File Some of the formats For the chosen files could not be recognized rt un
16. ipon na gt HBA_ACCGE HBB_ANAPP HBB_AQUCH HBB_CALJA Copy Selection Realign selection Expand Selection Open Selection in New View Set alignment fixpoint here Ll Edit Galactian Figure 18 6 Adding a fixpoint to a sequence in an existing alignment At the top you can see a fixpoint that has already been added When you click Create alignment and go to Step 2 check Use fixpoints in order to force the CHAPTER 18 SEQUENCE ALIGNMENT 245 alignment algorithm to align the fixpoints in the selected sequences to each other In figure 18 7 the result of an alignment using fixpoints is illustrated FEE Alignment wit 7 HBA_ANAPE lt a HBA_ANSSE EXA HBA_ACCGE HBB_ANAPP HBB_AQUCH HBB_CALJA E3 Realigned wit HBA_ANAPE HBA_ANSSE point HBA_ACCGE e ms HBB_ANAP _ _ _ _ HBB_AQUCH Eso HBB_CALJA _ _ Figure 18 7 Realigning using fixpoints In the top view fixpoints have been added to two of the sequences In the view below the alignment has been realigned using the fixpoints The three top sequences are very similar and therefore they follow the one sequence number two from the top that has a fixpoint You can add multiple fixpoints e g adding two fixpoints to the sequences that are aligned will force their first fi
17. 17 2 Graphical display of in silico cloning 1 eee ee te 17 2 1 Introduction to the cloning view 1 ee ee 17 2 2 View preferences for cloning View 2 0 2 ee es 17 2 3 How to navigate the cloning view 2 ee ee 17 2 4 Manipulate sequences ee es 17 2 5 Insert one sequence into another 17 26 Sh w ima Circular VIEW s sos curis a a a 17 2 7 Real cloning example aoaaa es 17 3 Restriction site analysis 1 ee ee 17 3 1 Restriction site parameters aoao aoao aoao es 17 4 Restriction enzyme lists lt lt 17 4 Create enzyme ist osre ea i ae ee ee a a 17 42 Modify enzyme list 24 20 ae ge A eee ee ele ee eo ow es 17 5 Gel electrophoresis aa 17 5 1 Separate sequences on gel o 17 5 2 Separate fragments of sequences using restriction enzymes A A A A aea a a EE Ea dion SL Ge enn Ge poise woes CLC Gene Workbench offers graphically advanced in silico cloning and design of vectors for various purposes together with restriction enzyme analysis and functionalities for managing lists of restriction enzymes First after a brief introduction the cloning and vector design is explained Next the restriction site analyses are described 17 1 Molecular cloning an introduction Molecular cloning is a very important tool in the quest to understand gene
18. 54739 Exon Exon 1 62137 62278 162136 gt HUMHBB 3 19500 20000 20500 21000 HBE HUMH BB lt 1 gt Figure 2 37 Clicking the HBE1 coding region in the top view selects the annotation on the sequence in the bottom view For sequences with many annotations it is easier to navigate using these links compared to of scrolling in the ordinary view of the sequence CHAPTER 2 TUTORIALS 50 2 11 4 Split sequences into several lines Producing graphics of long sequences can be a strenuous task especially if you have not discovered the Wrap sequence option If you just export graphics of a long sequence without wrapping you will get an extremely wide graphics file which probably has be edited in a graphics program before use Wrapping the sequence allows you to control the width and height of the graphics file see figure 2 38 v Sequence layout J Spaces every 10 residues O No wrap Auto wrap O Fixed wrap C Double stranded Figure 2 38 Wrapping the sequence automatically 2 11 5 Make a new sequence of a coding region If you have a genomic sequence containing a coding region you can easily make a new sequence which only consists of the coding region See figure 2 39 right click the coding region s annotation Open Annotation in New View This will open a new sequence which only consists of the residues covered by the annotation HB Select Annotation ss Oen Annotation in New Vie
19. At this point you can either settle on a specific primer pair or save the table for later If you want to use e g the first primer pair for your experiment right click this primer pair in the table and save the primers You can also mark the position of the primers on the sequence by selecting Mark primer annotation on sequence in the right click menu see figure 2 30 You have now reached the end of this tutorial which has shown some of the many options of the primer design functionalities of CLC Gene Workbench You can read much more in the program s Help function EY or in the users manual on http www clcbio com download CHAPTER 2 TUTORIALS 45 view Preferences Standard primers for sequence pBR322 Me e 5 C Self annealing F1 Penalty Pair annealing align F1 R1 Sequence F1 Secondary structure Sequence R1 1 C Self annealing alignment F1 _ Self end annealing F1 CATCCTCTCTCGTTTCAT e T i 98 66 rit CATCCTCTCTCGTTTCAT go Ay IGCGGTTTTTTCCTGTTTG CI GC content F1 GTTTGTCCTTTTTTGGCG c C Melting temperature F1 T C E Tc Li C Secondary structure score F1 Lo TR y f V Sequence R1 GCGGTTTTTTCCTGTTTG T c ap 100 39 I I l l CATCCTCTCTCGTTTCATC c GCGGTTTTTTCCTGTTTG O Region R1 Cc CTACTTTGCTCTCTCCTAC 1 C Self annealing R1 C Self annealing alignment R1 _ Self end annealing R1 CATCCTCTCTCGTTTCAT gt he C GC content R1 100 61 ru CATCCTCTCTCGTTTCAT A GGCGGTTTTTICCTGTTT Melting
20. OK The name of the selected Workspace is shown after CLC Gene Workbench 2 0 at the top left corner of the main window in this case default 3 5 3 Delete Workspace Deleting a Workspace can be done in the following way CHAPTER 3 USER INTERFACE 74 9 CLC Gene Workbench 2 0 Default DER File Edit Search View Toolbox Workspace Help inka el lla a EEN AAN sp 1 L Cut Copy Paste Delete Workspace Search Pan PAPA Zoom In Zoom o pr mre Sh New Import Export Default project For CLC user come wat xample data 1 Alignments and Trees E General Sequence Analyses 4 Nucleotide Analyses 4 lg Protein Analyses a A Primers and Probes Q UIC k sta rt E a Assembly EE Cloning and Restriction Sites le 2 BLAST Search E Database Search Processes Toolbox E Idle Figure 3 15 An empty Workspace Workspace in the Menu Bar Delete Workspace choose which Workspace to delete OK Notice Be careful to select the right Workspace when deleting The delete action cannot be undone However no data is lost because a workspace is only a representation of data It is not possible to delete the default workspace 3 6 List of shortcuts The keyboard shortcuts in CLC Gene Workbench 2 0 are listed below CHAPTER 3 USER INTERFACE 75 Action Windows Linux Mac OS X Adjust selection Shift arrow keys Shift arrow keys Change between tabs Ctrl tab a tab Close C
21. The Navigation Area always contains the same data across Workspaces It is however possible to open different folders in the different Workspaces Consequently the program allows you to display different clusters of the data in separate Workspaces All Workspaces are automatically saved when closing down CLC Gene Workbench 2 0 The next time you run the program the Workspaces are reopened exactly as you left them Notice It is not possible to run more than one version of CLC Gene Workbench 2 0 at a time Use two or more Workspaces instead 3 5 1 Create Workspace When working with large amounts of data it might be a good idea to split the work into two or more Workspaces As default the CLC Gene Workbench opens one Workspace the largest window in the right side of the workbench see 3 1 Additional Workspaces are created in the following way Workspace in the Menu Bar Create Workspace enter name of Workspace OK When the new Workspace is created the heading of the program frame displays the name of the new Workspace Initially the Project Tree in the Navigation Area is collapsed and the View Area is empty and ready to work with See figure 3 15 3 5 2 Select Workspace When there is more than one Workspace in the workbench there are two ways to switch between them Workspace ED in the Toolbar Select the Workspace to activate or Workspace in the Menu Bar Select Workspace ED choose which Workspace to activate
22. You are free to to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in it s original form and CLC bio has to be clearly labelled as author and provider of the work You may not use this work for commercial purposes You may not alter transform or build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more about how you may use the contents CHAPTER 12 GENERAL SEQUENCE ANALYSES 156 12 5 Join sequences CLC Gene Workbench can join several nucleotide or protein sequences into one sequence This feature can for example be used to construct supergenes for phylogenetic inference by joining several disjoint genes into one Note that when sequences are joined all their annotations are carried over to the new spliced sequence Two or more Sequences can be joined by select sequences to join Toolbox in the Menu Bar General Sequence Analyses Join sequences 338 or select sequences to join right click either selected sequence Toolbox General Sequence Analyses Join sequences 3 This opens the dialog shown in figure 12 16 9 Join Sequences 1 Select Sequences of Same Type Projects Selected Elements S L Example data DOC PERH3BC B E Nucleotide 20 PERH2BD eae Sequences ESAPERH3BC es 20 HUMDINUC i sequence list j Assembly 3 3 Cloning project Primer de
23. blastp Protein sequence against Protein database Database _ pdb Sequences derived from 3 dimensional struct Genetic code Le jL9 J previous pret Y Figure 10 2 Choose a BLAST Program and a database for the search BLAST search for DNA sequences e BLASTn DNA sequence against DNA database This BLAST method is used to identify homologous DNA sequences to your query sequence e BLASTx Translated DNA sequence against Protein database If you want to search in protein databases this BLAST method allows for automated translation of the DNA input sequence and searching in various protein databases e tBLASTx Translated DNA sequence against Translated DNA database Here is both the input DNA sequence and the searched DNA database automatically translated BLAST search for protein sequences e BLASTp Protein sequence against Protein database This the most common BLAST method used when searching for homologous protein sequences having a protein sequence as search input CHAPTER 10 BLAST SEARCH 109 e tBLASTn Protein sequence against Translated DNA database Here is the protein sequence searched against an automatically translated DNA database Depending on whether you choose a protein or a DNA sequence a number of different databases can be searched A complete list of these databases can be found in Appendix B When nr appears in the Database parameter drop down menu the search will inclu
24. files must be in fasta fsa fa fasta format To create a local BLAST data base from the file system or from the Navigation Area BLAST search in Toolbox Create Local BLAST Database CHAPTER 10 BLAST SEARCH 115 Select a BLAST Protein Database Projects Selected Elements sO a E le le a e Figure 10 8 Select your local BLAST database Example data gli blast database E Nucleotide E Protein Extra Performed analyses E README l CLC bio Home P Hblast database Alignments 30 ke Car 9 BLAST Against Local Database 1 Select sequences of the MENA same type 2 Set program parameters 3 Set input parameters Choose Parameters Choose filter Low Complexity Human Repeats Mask For Lookup Mask Lower Case Expect 10 Word Size 3 Matrix BLOSUM62 Gap Cost Existence 11 Extension 1 No of processors 1 g No of output alignments 250 O Ja _ Previous Next Figure 10 9 Examples of different limitations which can be set before submitting a BLAST search This opens the dialog seen in figure 10 10 e Select Input Source Lets you choose whether to include sequences from the Navigation Area or from the computer s file system External FASTA file e Sequence type If you choose to import sequences from an external FASTA file into the database you must choose whether
25. of Finish 2 Cancel Figure 13 3 Creating a reverse complement sequence If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish This will open a new view in the View Area displaying the reverse complement of the selected sequence The new sequence is not saved automatically To save the protein sequence drag it into the Navigation Area or press Ctrl S S on Mac to activate a save dialog CHAPTER 13 NUCLEOTIDE ANALYSES 166 13 4 Translation of DNA or RNA to protein In CLC Gene Workbench 2 0 you can translate a nucleotide sequence into a protein sequence using the Toolbox tools Usually you use the 1 reading frame which means that the translation starts from the first nucleotide Stop codons result in an asterisk being inserted in the protein sequence at the corresponding position It is possible to translate in any combination of the six reading frames in one analysis To translate select a nucleotide sequence Toolbox in the Menu Bar Nucleotide Analyses lt A Translate to Protein 25 or right click a nucleotide sequence Toolbox Nucleotide Analyses A Translate to Protein A This opens the dialog displayed in figure 13 4 9 Translate to
26. repeat rich areas one or more No primers here regions can be defined It is required that the Forward primer region is located upstream of the Forward inner primer region that the Forward inner primer region is located upstream of the Reverse inner primer region and that the Reverse inner primer region is located upstream of the Reverse primer region In Nested PCR mode the Inner melting temperature menu in the Primer parameters panel is activated allowing the user to set a separate melting temperature interval for the inner and outer primer pairs After exploring the available primers see section 15 3 and setting the desired parameter values in the Primer parameters preference group the calculate button will activate the primer design algorithm After pressing the calculate button a dialogue will appear see figure 15 9 The top and bottom parts of this dialog are identical to the Standard PCR dialogue for designing primer pairs described above The central part of the dialogue contains parameters pertaining to primer pairs and the comparison between the outer and the inner pair Here five options can be set CHAPTER 15 PRIMERS 196 Y Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C re
27. screen shot of the parameter settings can be seen in figure 12 19 Motif types You can choose literal string simple motif or Java regular expression as your motif type For proteins you can choose to search with a Prosite regular expression Motif If you choose to search with a simple motif you should enter a literal string as your motif Ambiguous amino acids and nucleotides are allowed Example ATGATGNNATG If your motif type is Java regular expression you should enter a regular expression according to the syntax rules above Press F1 key for options For proteins you can search with a Prosite regular expression and you should enter a protein pattern from the PROSITE database Accuracy If you search with a simple motif you can adjust the accuracy of the search string to the match on the sequence Table output Opens the motifs or patterns found in a table view It is possible to see one table per sequence but it is also possible with one table for multiple sequences Add motif to sequence as annotation Check this box to add search strings found as annotations on the sequence Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish This will open a view showing the motifs or patterns found as annotations on the original CHAPTER 12 GENERAL SEQUENCE ANALYSES 160 sequence see figure 12 20 If you have selected several sequences a corresponding number of views will be opened Ma
28. yeast and E coli e pH 6 5 e 6 0 M guanidium hydrochloride e 0 02 M phosphate buffer The extinction coefficient values of the three important amino acids at different wavelengths are found in Gill and von Hippel 1989 Knowing the extinction coefficient the absorbance optical density can be calculated using the following formula Ext Protei Absorbance Protein ee 12 3 Two values are reported The first value is computed assuming that all cysteine residues appear as half cystines meaning they form di sulfide bridges to other cysteines The second number assumes that no di sulfide bonds are formed Atomic composition Amino acids are indeed very simple compounds All 20 amino acids consist of combinations of only five different atoms The atoms which can be found in these simple structures are Carbon Nitrogen Hydrogen Sulfur Oxygen The atomic composition of a protein can for example be used to calculate the precise molecular weight of the entire protein CHAPTER 12 GENERAL SEQUENCE ANALYSES 155 Total number of negatively charged residues Asp Glu At neutral pH the fraction of negatively charged residues provides information about the location of the protein Intracellular proteins tend to have a higher fraction of negatively charged residues than extracellular proteins Total number of positively charged residues Arg Lys At neutral pH nuclear proteins have a high relative percentage of positively char
29. 12 6 sequences where none of the aligned sequences share less than 62 identity This resulted in a scoring matrix called BLOSUM62 In contrast to the PAM matrices the BLOSUM matrices are calculated from alignments without gaps emerging from the BLOCKS database http BLOCKS i here 0rg Sean Eddy recently wrote a paper reviewing the BLOSUM62 substitution matrix and how to calculate the scores Eddy 2004 Use of scoring matrices Deciding which scoring matrix you should use in order of obtain the best alignment results is a difficult task If you have no prior knowledge on the sequence the BLOSUM62 is probably the best choice This matrix has become the de facto standard for scoring matrices and is also used as the default matrix in BLAST searches The selection of a wrong scoring matrix will most probable strongly influence on the outcome of the analysis In general a few rules apply to the selection of scoring matrices e For closely related sequences choose BLOSUM matrices created for highly similar align ments like BLOSUM80O You can also select low PAM matrices such as PAM1 e For distant related sequences select low BLOSUM matrices for example BLOSUM45 or high PAM matrices such as PAM250 The BLOSUM matrices with low numbers correspond to PAM matrices with high numbers See figure 12 11 for correlations between the PAM and BLOSUM matrices To summarize if you CHAPTER 12 GENERAL SEQUENCE ANALYSES 145 Low compla
30. 17 5 3 Gel view In figure 17 22 you can see a simulation of a gel with its Side Panel to the right This view will be explained in this section CHAPTER 17 CLONING AND CUTTING 238 Mv nces w qu w ds w Separat HUMDINUC pBR322 HUMHBB Figure 17 21 Gel electrophoresis of three sequences The left side shows the sequences together in one lane each represented by a band The right side shows a lane for each sequence o Gel 2 a 2 o Ss PERH1BA PERH1BB PERH2BB PERH3BA e wm Gel options Gel background Mi tono Scale band spread i Show marker ladder 200 50 20 10 5 3 gt Text format Figure 17 22 Five lanes showing fragments of five sequences cut with restriction enzymes Information on bands and fragment size You can get information about the individual bands by hovering the mouse cursor on the band of interest This will display a tool tip with information about the fragment size and for lanes comparing whole sequences you will also see the sequence name wn W Notice You have to be in Selection or Pan _ mode in order to get this information CHAPTER 17 CLONING AND CUTTING 239 It can be useful to add markers to the gel which enables you to compare the sizes of the bands This is done by clicking Show marker ladder in the Side Panel You enter the markers by writing them in the text field separated by commas M
31. 2 00 see eee ene 47 2 11 1 Open and arrange views using drag and drop 4 48 2 11 2 Find element in the Navigation Area 2 2 2 2 2 0020 eee 48 2 11 3 Find specific annotations on a Sequence 2 0 2 eee eae 49 25 CHAPTER 2 TUTORIALS 26 2 11 4 Split sequences into several lines 50 2 11 5 Make a new sequence of a coding region o 50 2 11 6 Translate a coding region o s so siono a asti ee 50 2 11 7 Copy annotations from one sequence to another 51 2 11 8 Get overview and detail of a sequence at the same time 51 2 11 9 Smart selecting in sequences and alignments 52 2 11 1 heck for updates and additional information about sequences 52 2 11 1 Quickly import sequences using copy paste o 53 2 11 1Perform analyses on many elements o 53 2 11 1Drag elements to the Toolbox 54 2 11 14xport elements while preserving history o o 54 2 11 1Avoid the mouse trap use keyboard shortcuts 55 This chapter contains tutorials representing some of the features of CLC Gene Workbench 2 0 The first tutorials are meant as a short introduction to operating the program The last tutorials give examples of how to use some of the main features of CLC Gene Workbench 2 0 The tutorials are also available as interactive Flash tutorials on http www
32. 2004 06 29 BCO32122 Homo sapiens hemoglobin alpha 2 mRNA cDNA clone MGC 29691 IMA 2003 12 19 BCO32264 Mus musculus hemoglobin beta adult minor chain MRNA cDNA clone M 2006 04 13 BC043020 Mus musculus hemoglobin alpha adult chain 1 mRNA cDNA clone MGC 2004 06 30 Bcos0661 Homo sapiens hemoglobin alpha 2 mRNA cDNA clone MGC 60177 IMA 2003 10 07 BC051988 Mus musculus hemoglobin x alpha like embryonic chain in Hba complex 2004 06 30 Bco52008 Mus musculus hemoglobin Z beta like embryonic chain mRNA CDMA cl 2006 04 27 BCOS6686 Homo sapiens hemoglobin theta 1 mRNA cDNA clone MGC 61857 IMA 2004 06 30 BC057014 Mus musculus hemoglobin Y beta like embryonic chain transcript varia 2005 12 09 BCO69307 Homo sapiens hemoglobin delta mRNA cDNA clone MGC 96894 IMAG 2004 06 30 i Download and Open Download and Save 50 of 236 hits shown more wen y Figure 9 1 The GenBank search dialog Notice The search is a and search meaning that when adding search parameters to your search you search for both or all text strings rather than any of the text strings You can append a wildcard character by checking the checkbox at the bottom This means that you only have to enter the first part of the search text e g searching for genom will find both genomic and genome The following parameters can be added to the search e All fields Text searches in all parameters in the NCBI database at th
33. 22 Conflicts overview in assembly 219 Consensus sequence 246 265 open 246 Conservation 246 graphs 265 Contact information 11 Contig 265 ambiguities 219 create 212 view and edit 216 Convert old data 87 Copy 93 annotations in alignments 250 elements in Navigation Area 60 into sequence 124 search results GenBank 104 sequence 128 130 sequence selection 165 text selection 128 cpf file format 78 Create a project tutorial 26 alignment 241 dot plots 137 enzyme list 234 local BLAST database 114 new folder 59 new project 59 workspace 73 Data formats bioinformatic 270 graphics 271 Data structure 58 Database GenBank 101 local 58 nucleotide 268 peptide 268 Delete element 62 residues and gaps in alignment 250 workspace 73 Demo license 15 Dipeptide distribution 155 DNA translation 166 DNAstrider file format 29 86 270 Dot plots 265 Bioinformatics explained 139 create 137 INDEX 278 print 138 Double stranded DNA 118 Download and open search results GenBank 104 Download and save search results GenBank 104 Download of CLC Gene Workbench 11 Drag and drop 48 Navigation Area 60 search results GenBank 104 Edit alignments 249 265 annotations 124 265 enzymes 120 133 224 sequence 124 sequences 265 Element 58 delete 62 rename 62 embl file format 86 Embl file format 29 86 270 Encapsulated PostScript export 92
34. 3 4 4 2 3 4 3 2 4 2 2 Oo 3 2 1 2 1 1 K 1 2 0 1 3 1 1 2 1 3 2 5 12 3 1 Oo 1 3 2 2 M A 1 2 3 1 Oo 2 3 2 1 2 1 5 0 2 1 1 1 1 1 F 2 3 3 3 2 3 3 3 1 0 o 3 0 6 4 2 2 1 3 1 P 1 2 2 1 3 1 1 2 2 3 3 1 2 4 T 4 1 4 3 2 S T 1 1 O 1 0 0 O 1 2 2 O 1 2 1 4 1 3 2 2 T 0 1 0 1 4 4 1 2 2 1 4 4 4 2 1 1 5 2 2 0 w 3 3 4 4 2 2 3 2 2 3 2 3 1 1 4 3 2 11 2 3 Y 2 2 2 3 2 4 2 3 2 1 4 2 1 3 3 2 2 2 T 1 V 0 3 3 3 1 2 2 3 3 3 1 2 1 1 2 2 0 3 1 4 Table 12 1 The BLOSUM62 matrix A tabular view of the BLOSUM62 matrix containing all possible substitution scores Henikoff and Henikoff 1992 Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in it s original form and CLC bio has to be clearly labelled as author and provider of the work You may not use this work for commercial purposes You may not alter transform or build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more about how you may use the contents 12 2 Shuffle sequence In some cases it is beneficial to shuffle a sequence This is an option in the Toolbox menu under General Sequence Analyses It is normally used for statistical analyses
35. 7 2 Pattern search output If the analysis is performed on several sequences at a time the method will search for patterns in the sequences and open a new view for each of the sequences in which a pattern was discovered Each novel pattern will be represented as an annotation of the type Region More information on each found pattern is available through the tooltip including detailed information on the position of the pattern and quality scores It is also possible to get a tabular view of all found patterns in one combined table Then each found pattern will be represented with various information on obtained scores quality of the pattern and position in the sequence Chapter 13 Nucleotide analyses Contents 13 1 Convert DNA to RNA 0 0 ee eee ee 163 13 2 Convert RNA to DNA 0 lt 164 13 3 Reverse complements of sequences 2 ee eee eens 165 13 4 Translation of DNA or RNA to protein lt lt lt lt eens 166 13 4 1 Translate part of a nucleotide sequence 2 00 eee eens 167 13 5 Annotate with SNPs lt lt 2 167 13 6 Find open reading frames 168 13 6 1 Open reading frame parameters ee 168 CLC Gene Workbench 2 0 offers different kinds of sequence analyses which only apply to DNA and RNA 13 1 Convert DNA to RNA
36. A T or U For amino acids only B Z and X are colored as non standard residues Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color e Rasmol colors Colors the residues according to the Rasmol color scheme See http www openrasmol org doc rasmol html Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color e Polarity colors only protein Colors the residues according to the polarity of amino acids Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color Nucleotide info These preferences only apply to nucleotide sequences e Translation Displays a translation into protein just below the nucleotide sequence Depending on the zoom level the amino acids are displayed with three letters or one letter Frame Determines where to start the translation x 1 to 1 Select one of the six reading frames x Selection This option will only take effect when you make a selection on the sequence The translation will start from the first nucleotide selected Making a new selection will automatically display the
37. Choose Parameters Limit by entrez query DE Choose filter Low Complexity Human Repeats Mask For Lookup Mask Lower Case Expect 10 Word Size 3 Matrix BLOSUM62 Gap Cost Existence 11 Extension 1 ME ere Y Figure 2 19 The BLAST search is limited to homo sapiens ORGN The remaining parameters are left as default The output is shown in figure 2 20 and consists of a list of potential homologs that are sorted by their BLAST match score and shown in descending order below the query sequence Try placing your mouse pointer over a potential homologous sequence You will see that a context box appears containing information about the sequence and the match scores obtained from the BLAST algorithm For now we will focus our attention on sequence PO2042 the BLAST hit that is second from the top of the list To open sequence PO2042 right click the line representing sequence P02042 Open Sequence in New View This opens the sequence However the sequence is not saved yet Drag and drop the sequence into the Navigation Area to save it This homologous sequence is now part of your project and you can use it to gain information about the query sequence by using the various tools of the workbench e g by studying its textual information by studying its annotation or by aligning it to the query sequences CHAPTER 2 TUTORIALS 41 100 BLAST settings x ye 2 NP_058652
38. F on Windows or F on Mac You can both search for annotations residues or positions The result of the search is a selection as shown in figure 2 43 Remember to separate the start and end numbers with two punctuation marks No matter how you make your selection you can see the start and end positions in right part of the status bar below the View Area 2 11 10 Check for updates and additional information about sequences If you have downloaded a sequence from NCBI or UniProt you can easily check if the online information about the sequence has been updated and get additional information about the CHAPTER 2 TUTORIALS 53 IHBD HBB Sequence search h CCTTTAGTGATGGCCTGGCITGAGGTGGACAACCTCAAGGGCA Position search Figure 2 43 Making a selection from position 20 to 29 both included using the Search function sequence right click the sequence Web info NCBI or UniProt This will open your default web browser showing the information about the sequence at either NCBI or UniProt Clicking PubMed instead of NCBI UniProt gives you a direct link to the sequence s PubMed references 2 11 11 Quickly import sequences using copy paste Instead of using the Import ES function to import a sequence you can use copy paste lf you have copied the sequence from a source outside the program e g a webpage or text document you can paste it into the text field in the Create new sequence dialog shown in figure 2 4
39. FOR IMPORT AND EXPORT 271 mentioned formats are the types which can be read by CLC Gene Workbench C 2 List of graphics data formats Below is a list of formats for exporting graphics All data displayed in a graphical format can be exported using these formats Data represented in lists and tables can only be exported in pdf format see section 6 3 for further details Format Suffix Type Portable Network Graphics png bitmap JPEG Jpg bitmap Tagged Image File tif bitmap PostScript ps vector graphics Encapsulated PostScript eps vector graphics Portable Document Format paf vector graphics Scalable Vector Graphics SVE vector graphics Bibliography Andrade et al 1998 Andrade M A O Donoghue S I and Rost B 1998 Adaptation of protein surfaces to subcellular location J Mo Biol 276 2 517 525 Bachmair et al 1986 Bachmair A Finley D and Varshavsky A 1986 In vivo half life of a protein is a function of its amino terminal residue Science 234 4773 179 186 Bommarito et al 2000 Bommarito S Peyret N and SantaLucia J 2000 Thermodynamic parameters for DNA sequences with dangling ends Nucleic Acids Res 28 9 1929 1934 Cornette et al 1987 Cornette J L Cease K B Margalit H Spouge J L Berzofsky J A and DeLisi C 1987 Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins J Mol Biol 195 3 659 685 C
40. Figure 3 8 Save dialog In the dialog you select the folder or project in which you want to save the element After naming the element press OK 3 2 4 Undo Redo If you make a change in a view e g remove an annotation in a sequence or modify a tree you can undo the action In general Undo applies to all changes you can make when right clicking in CHAPTER 3 USER INTERFACE 67 a view Undo is done by Click undo in the Toolbar or Edit Undo or Ctrl Z If you want to undo several actions just repeat the steps above To reverse the undo action Click the redo icon in the Toolbar or Edit Redo or Ctrl Y Notice Actions in the Navigation Area e g renaming and moving elements cannot be undone However you can restore deleted elements See section 3 1 6 You can set the number of possible undo actions in the Preferences dialog see section 4 3 2 5 Arrange Views in View Area Views are arranged in the View Area by their tabs The order of the Views can be changed using drag and drop E g drag the tab of one View onto the tab of a another The tab of the first View is now placed at the right side of the other tab If a tab is dragged into a View an area of the View is made gray see fig 3 9 illustrating that the view will be placed in this part of the View Area The results of this action is illustrated in figure 3 10 You can also split a View Area horizontally or vertically using the menus Split
41. Gill gt sp P09105 HBAT_HUMAN Hemoglobin theta 1 subunit Hemoglobin theta 1 chain Theta 1 globin Identity Score 90 9 bits 224 Expect 2E 19 Identities 53 145 37 Gaps 8 145 6 40 100 a Strand PlusPlus Y ES NP_058652 BLAST Summary of hits from query NP_058652 Number of hits 19 Query HE Description E value Score Hitstart Hit end Query start Query end Identity al NP_0S8652 _ P02042 Hemoglobin 2 25866E 64 611 0 1 147 197 114 al NP_058652 POZ100 H in 1 37028E 61 587 0 1 147 NP_058652 P69892 Hemoglobin 8 31322E 59 563 0 1 147 NP_058652 P69691 Hemoglobin 4 125818 58 557 0 1 147 17 107 147 101 197 100 a _Download and Open_ Download and Save _openatncer Figure 10 4 Display of the output of a BLAST search At the top is there a graphical representation of BLAST hits with tool tips showing additional information on individual hits Below is shown a tabular form of the BLAST results The BLAST Graphics and the BLAST table are described in the following chapters BLAST Graphics The BLAST editor shows the sequences hits which were found in the BLAST search The hit sequences are represented by colored horizontal lines and when hovering the mouse pointer over a BLAST hit sequence a tooltip appears listing the characteristics of the sequence There are several View preferences available for in the BLAST Gr
42. PERH2BD 20 HUMDINUC sequence list a Assembly fj Cloning project Primer design Hf Restriction analysis 8 9 Protein tg Extra w Performed analyses E README CLC bio Home Alignments Create new folder Figure 8 2 Specify a folder for the results of the analysis e Open This will open each of the selected sequences in a view e Save This will not open the sequences but just add the annotations e Copy and save in new folder This option does not add annotations to the existing sequences but saves a copy of the selected sequences Choosing this option means that there will be an extra step for selecting a folder where the copies of the sequences can be saved 8 1 2 Batch log For some analyses there is an extra option in the final step to create a log of the batch process This log will be created in the beginning of the process and continually updated with information about the results See an example of a log in figure 8 4 In this example the log displays information about how many open reading frames were found The log will either be saved with the results of the analysis or opened in a view with the results depending on how you chose to handle the results CHAPTER 8 HANDLING OF RESULTS 99 9 Find Open Reading Frames 1 Select nucleotide RA AAA sequences 2 Set parameters 3 Result handling Le Figure 8 3 The final step when the analysis does not create new
43. Sbjct subject This is the sequence found in the database The numbers of the query and subject sequences refer to the sequence positions in the submitted and found sequences If the subject sequence has number 59 in front of the sequence this means that 58 residues are found upstream of this position but these are not included in the alignment In addition to the latter described output of a BLAST search it is possible to view the BLAST results in a tabular view In the tabular view one can get a quick and fast overview of the results In the tabular view it is possible to select multiple sequences and for example download all of these in one single step Moreover it is possible to look additional information on each single hit is the BLAST result on the NCBI homepage These possibilities are either available through a right click with the mouse or by using the buttons at the end of the table 10 1 2 BLAST table If the BLAST table view was not selected in Step 4 of the BLAST search the table can be generated in the following way Right click the tab of the initial BLAST result view Show BLAST Table Figure 10 5 is an example of a BLAST Table CAA25204 BLAST Summary of hits from query CAA26204 Number of hits 103 Quer Hit Descript E value Score Hit start Hit end Query s Query end Identity caaz6204 1v85 D Chain D T 2 90803E 66 624 0 2 125 CAA26204 2DN3 B Chain B 1 2 90803E 66 624 0 2 125 i
44. Selection Figure 15 2 Right click menu allowing you to specify regions for the primer design mode but the user can change to a more detailed mode in the Primer information preference group The number of information lines reflects the chosen length interval for primers and probes In the compact information mode one line is shown for every possible primer length and each of these lines contain information regarding all possible primers of the given length At each potential primer starting position a circular information point is shown which indicates whether the primer fulfills the requirements set in the primer parameters preference group A green circle indicates a primer which fulfils all criteria and a red circle indicates a primer which fails to meet one or more of the set criteria For more detailed information place the mouse cursor over the circle representing the primer of interest A tool tip will then appear on screen displaying detailed information about the primer in relation to the set criteria To locate the primer on the sequence simply left click the circle using the mouse The various primer parameters can now be varied to explore their effect and the view area will dynamically update to reflect this allowing for a high degree of interactivity in the primer design process After having explored the potential primers the user may have found a satisfactory primer and choose to export this directly from the view area using a mo
45. The remaining options relate to the output of the analysis e Create output as annotations on sequence e Create text output e Create new enzyme list from selected enzymes which fulfill match number criteria e Separate restriction fragments on gel If you select the last output option Separate restriction fragments on gel there will be one more step If you have chosen this option click Next to see the dialog shown in figure 17 15 Here you have four different ways of simulating a gel electrophoresis using the selected restriction enzymes e Cut with selected enzymes and run in one lane This will display one lane with a number of bands corresponding to the number of fragments from cutting with the selected enzymes e Cut with selected enzymes and run in one lane per enzyme For each of the enzymes selected there will be a lane displaying the bands of the fragments from cutting just with this enzyme CHAPTER 17 CLONING AND CUTTING 233 g Find Restriction Sites 1 Select DNA sequences Choose gel aane 2 Filter enzymes 3 Set exclusion criteria and output options 4 Choose gel parameters Specify lanes on the gel Cut with selected enzymes and run in one lane Cut with selected enzymes and run in one lane per enzyme Cut with selected enzymes and run in one lane per sequence Cut with selected enzymes and run in one lane per sequence and per enzyme of Finish MX Cancel Figure 17 15 Choosing from fo
46. U 38 U on Mac to see the preferences panels of an open view The View Format allows you to change the way the elements appear in the Navigation Area The following text can be used to describe the element e Name this is the default information to be shown e Accession sequences downloaded from databases like GenBank have an accession number e Species CHAPTER 4 USER PREFERENCES 78 e Species accession e Common Species e Common Species accession The User Defined View Settings gives you an overview of different style sheets for your View preferences See section 4 5 for more about how to create and save style sheets The first time you use the program only the CLC Standard Settings is available However the tab allowing you to choose the style sheet for a viewer e g a Sequence viewer only appears after you have launched the viewer for the first time 4 3 Advanced preferences The Advanced settings include the possibility to set up a proxy server This is described in section 1 7 4 4 Export import of preferences The user preferences of the CLC Gene Workbench 2 0 can be exported to other users of the program allowing other users to display data with the same preferences as yours You can also use the export import preferences function to backup your preferences To export preferences open the Preferences dialog Ctrl K on Mac and do the following Export Select the relevant preferenc
47. Workbench It shares some of the advanced product features of CLC Protein Workbench and it has additional advanced features CLC Combined Workbench holds all basic and advanced features of the CLC Workbenches For an overview of which features the four workbenches include see http www clcbio com features CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 19 All workbenches will be improved continuously If you have a CLC Free Workbench or a commercial workbench and you are interested in receiving news about updates you should register your e mail and contact data on http www clcbio com if you haven t already registered when you downloaded the program 1 5 1 New program feature request The CLC team is continuously improving the program with our users interest in mind Therefore we welcome all requests from users and they can be submitted from our homepage http www clcbio com Likewise you are more than welcome to suggest new features or more general improvements to the program on support clcbio com 1 5 2 Report program errors CLC bio is doing everything possible to eliminate program errors Nevertheless some errors might have escaped our attention If you discover an error in the program you can use the Report a Program Error function in the Help menu of the program to report it In the Report a Program Error dialog you are asked to write your e mail address This is because we would like to be able to contact you for
48. alignment is the substitution matrix which assigns a score for aligning any possible pair of residues The matrix used in a BLAST search can be changed depending on the type of sequences you are searching with see the BLAST Frequently Asked Questions e Gap Cost The pull down menu shows the Gap Costs Penalty to open Gap and penalty to extend Gap There can only be a limited number of options for these parameters Increasing the Gap Costs and Lambda ratio will result in alignments which decrease the number of Gaps introduced The more limitations are submitted to the search parameters the faster the search will be conducted If no limitations are submitted the BLAST search may take several minutes When the Advanced parameters of Step 3 are adjusted click Next to choose whether you want to open the BLAST output in an editor and or in a table 10 1 1 Click Next if you wish to adjust how to handle the results See section 8 1 If not click Finish 10 1 1 Output from BLAST search The two different outputs from a BLAST search are shown in figure 10 4 CHAPTER 10 BLAST SEARCH 111 100 1 NP_058652 y BLAST Layout Y Gather sequences at top gil56749858 sp P68873 HBB_PANTR gil122713 sp P02042 HBD_HUMAN 91 122726 sp P02100 HBE_HUMAN w blast info gi 56749861 sp P 69892 HBG2_HUMAN_ aera
49. and select object icon yg in order to select a sequence to use as reference e Minimum aligned read length The minimum number of nucleotides in a read which must be successfully aligned to the contig If this criteria is not met by a read this is excluded from the assembly e Alignment stringency Specifies the stringency of the scoring function used by the alignment step in the contig assembly algorithm A higher stringency level will tend to produce contigs with less ambiguities but will also tend to omit more sequencing reads and to generate more and shorter contigs Three stringency levels can be set Low Medium High e Use existing trim information When using a reference sequence trimming is generally not necessary but if you wish to use trimming you can check this box It requires that the sequence reads have been trimmed beforehand see section 16 2 for more information about trimming When the parameters have been adjusted click Next to see the dialog shown in figure 16 6 CHAPTER 16 ASSEMBLY 215 9 Assemble Sequences to Reference 1 Select some nucleotide Mesa meters sequences 2 Set parameters 3 Set parameters Reference sequence V Make contig s with the reference sequence Only keep part of the reference sequence Make new contig s based on the reads 0 JCA _ Previous ext Y Finish XK Cancel Figure 16 6 Different options for the output of the as
50. assumes that you have used the program for a while since the basic usages are not explained 2 11 1 Open and arrange views using drag and drop Instead of opening views using double click or Show you can use drag and drop both to open and arrange views Drag and drop is supported both within the Navigation Area within the View Area and between the two areas 1 Drag and drop an element within the Navigation Area Moves the element to the drop loca tion 2 Drag an element from the Navigation Area to the View Area Opens the element in a new view The view will be opened in the part of the View Area where the element is dropped 3 Drag the tab of a view within the View Area If there are other views open this will split the View Area and make it possible to see several views at the time 4 Drag the tab of a view into the Navigation Area If the view is new and has not been saved to a project before this will save the view at the drop location If the view is already represented in the Navigation Area this will save a copy of the view at the drop location 2 11 2 Find element in the Navigation Area If you have a view of e g a sequence and you wish to know in which project this sequence is saved use the Find in Project function CHAPTER 2 TUTORIALS 49 right click the tab of the view View Find in Project H This will select the sequence in the Navigation Area see figure 2 36 D CAA24102 Ma File bA v v Vie
51. been used to identify related proteins In order to search for a known motif Select DNA or protein sequence s Toolbox in the Menu Bar General Sequence Analyses Motif Search K or Right click DNA or protein sequence s Toolbox General Sequence Analyses lt A Motif Search K If a Sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree CHAPTER 12 GENERAL SEQUENCE ANALYSES 159 Y Motif Search 1 Select one or more Mean sequences of same type 2 Set parameters Set motif parameters Simple motif O Java regular expression O Prosite regular expression as Accuracy 80 vw Output options Add hits to sequence as annotations Table output One per sequence v L JLs _ Previous Pnet Y Finish YK Cancel Figure 12 19 Setting parameters for the motif search See text for details You can perform the analysis on several DNA or several protein sequences at a time If the analysis is performed on several sequences at a time the method will search for patterns in the sequences and open a new view for each of the sequences Click Next to adjust parameters see figure 12 19 12 6 1 Motif search parameter settings Various parameters can be set prior to the motif search The parameters are listed below and a
52. before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree Clicking Next generates the dialog shown in figure 17 13 Find Restriction Sites 1 Select DNA sequences AAA 2 Fiter enzymes Choose from enzyme set All available v Only include enzymes which have Minimum recognition sequence length D C Blunt ends 3 overhang OS overhang Enzymes that comply with criteria Include Name Recognition S Overhang Methylation s Popularity ASiSI lacgateac e S methyleytosine iTsp4Cr lacngt E Psst ragnccy Emul actgoo SarBt ceacag Bbv121 gwgcwe v Fall laagnnnnnett Sst gagete Chal gate BseST akgeme BsrSI lactag Bavi gtatec Batac acngt Bsgl gtacag TspGWI lacaga Bbel gacgce CstMI laaggag BStaPr acanmmnntac Previous pex XK cancer i je i je F ig F ig F je IHpyCHanIr acngt E F F Te i ig ja ig ig ig SR RR Ra Y Figure 17 13 Selecting enzymes In Step 2 you can adjust which enzymes to use Choose from enzyme set allows you to select an enzyme list which is stored in the Navigation Area See section 17 4 for more about creating and modifying enzyme lists Only include enzymes which have In this part of the dialog you can limit the number of enzymes included in the list belo
53. corresponding translation Read more about selecting in section 11 1 2 x All Select all reading frames at once The translations will be displayed on top of each other CHAPTER 11 VIEWING AND EDITING SEQUENCES 122 Table The translation table to use in the translation For more about translation tables see section 13 4 Only AUG start codons For most genetic codes a number of codons can be start codons Selecting this option only colors the AUG codons green e Trace data See section 16 1 e G C content Calculates the G C content of a part of the sequence and shows it as a gradient of colors or as a graph below the sequence Window length Determines the length of the part of the sequence to calculate A window length of 9 will calculate the G C content for the nucleotide in question plus the 4 nucleotides to the left and the 4 nucleotides to the right A narrow window will focus on small fluctuations in the G C content level whereas a wider window will show fluctuations between larger parts of the sequence Foreground color Colors the letter using a gradient where the left side color is used for low levels of G C content and the right side color is used for high levels of G C content The sliders just above the gradient color box can be dragged to highlight relevant levels of G C content The colors can be changed by clicking the box This will show a list of gradients to choose from Background color
54. create 73 delete 73 save 73 select 73 Wrap sequences 118 Zoom 69 tutorial 29 Zoom In 69 Zoom Out 71 Zoom to 100 71
55. default the undo limit is set to 500 By writing a higher number in this field more actions can be undone Undo applies to all changes made on sequences alignments or trees See section 3 2 4 for more on this topic e Number of hits The number of hits shown in CLC Gene Workbench 2 0 when e g searching NCBI The sequences shown in the program are not downloaded until dragged saved into the Navigation Area e Locale Setting i e in which country you are located This determines the punctuation to be used 4 2 Default View preferences There are five groups of default View settings Toolbar Side Panel Location 1 2 3 New View 4 View Format 5 Default view settings sheet In general these are default settings for the user interface The fToolbar preferences let you choose the size of the toolbar icons and you can choose whether to display names below the icons The Side Panel Location setting lets you choose between Dock in views and Float in window When docked in view view preferences will be located in the right side of the view of e g an alignment When floating in window the side panel can be placed everywhere in your screen also outside the workspace e g on a different screen See section 4 5 for more about floating side panels The New view setting allows you to choose whether the View preferences are to be shown automatically when opening a new view If this option is not chosen you can press Ctrl
56. down the Ctrl button Click the tab of the view while the button is pressed By right clicking a tab the following close options exist See figure 3 7 e Close See above e Close Tab Area Closes all tabs in the tab area e Close All Views Closes all tabs in all tab areas Leaves an empty workspace e Close Other Tabs Closes all other tabs in the particular tab area CHAPTER 3 USER INTERFACE 66 P68046 P68053 0 P680630 Y File gt b HBB View gt Toolbox gt P68225 MVHLTPEEKNAVTTLV Show gt PX Close Ctrl W 2 Close Tab Area TA Close All Views Ctrl Shift Ww A My Close Other Tabs P68225 ESFGDLSSPDAVMGNEFE Figure 3 7 By right clicking a tab several close options are available 3 2 3 Save changes in a View When changes are made in a view the text on the tab appears bold and italic This indicates that the changes are not saved The Save function may be activated in two ways Click the tab of the View you want to save Save H in the toolbar or Click the tab of the View you want to save Ctrl S 36 S on Mac If you close a View containing an element that has been changed since you opened it you are asked if you want to save When saving a new view that has not been opened from the Navigation Area e g when opening a sequence from a list of search hits a save dialog appears figure 3 8 68 gt Example data Name name of saved element Y o Y Cancel Y Hep
57. e g e Choose a codon randomly e Select the most frequent codon in a given organism e Randomize a codon but with respect to its frequency in the organism As an example we want to translate an alanine to the corresponding codon Four different codons can be used for this reverse translation GCU GCC GCA or GCG By picking either one by random choice we will get an alanine The most frequent codon coding for an alanine in E coli is GCG encoding 33 7 of all alanines Then comes GCC 25 5 GCA 20 3 and finally GCU 15 3 The data are retrieved from the Codon usage database see below Always picking the most frequent codon does not necessarily give the best answer By selecting codons from a distribution of calculated codon frequencies the DNA sequence obtained after the reverse translation holds the correct or nearly correct codon distribution It should be kept in mind that the obtained DNA sequence is not necessarily identical to the original one encoding the protein in the first place due to the degeneracy of the genetic code In order to obtain the best possible result of the reverse translation one should use the codon frequency table from the correct organism or a closely related species The codon usage of the mitochondrial chromosome are often different from the native chromosome s thus mitochondrial codon frequency tables should only be used when working specifically with mitochondria CHAPTER 14 PROTEIN ANALYS
58. e g when comparing an alignment score with the distribution of scores of shuffled sequences The shuffling is done without replacement resulting in exactly the same number of the different residues as before the shuffling Shuffling a sequence removes all annotations that relate to the residues select sequence Toolbox in the Menu Bar General Sequence Analyses A Shuffle Sequence 3 CHAPTER 12 GENERAL SEQUENCE ANALYSES 147 or right click a sequence Toolbox General Sequence Analyses A Shuffle Sequence x This opens the dialog displayed in figure 12 12 9 Shuffle Sequence 1 Select sequences eee CU Sequences Projects Selected Elements LL Example data 2C HUMDINUC S E Nucleotide S k Sequences 206 PERH3BC 2C PERH2BD e f sequence list a Assembly 4 43 Cloning project 5 Primer design E Protein w Extra Performed analyses E README CLC bio Home LO gt Next of Finish 2 cancel Figure 12 12 Choosing sequence for shuffling If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish This will open a new view in the View Area displaying the shuffled sequence The new sequence is not saved automatically To save the
59. function and regulation Through molecular cloning it is possible to study individual genes in a controlled environment 221 CHAPTER 17 CLONING AND CUTTING 222 Using molecular cloning is it possible to build complete libraries of fragments of DNA inserted into appropriate cloning vectors We offer a significantly different approach for visual cloning than other software tools In CLC Gene Workbench the user is in total control of the cloning process 17 2 Graphical display of in silico cloning The in silico cloning process in CLC Gene Workbench begins with the selection of sequences to be used typically a vector sequence and an insert select the sequences in the Navigation Area Toolbox in the Menu Bar Cloning and Restriction Sites 4 Cloning This will open a view of the selected sequences similar to figure 17 1 G Cloning example E Sequence lst settings a 2s Sequence details 55 Esto AF134224 gt Sequence layout anda ma ATGAAGTT GTGGTCTA Sequence Details TACTTCAA CACCAGAT gt Annotation layout Annotation types 2 v Restriction sites protein 3000 I V Show SYNPBR3224 gt Done A TTCTCATG TTCAACA E Y Bami acarcc Sequence Detalls CA GTAC AAGTTCTT HB Y sol acatcr EN E EcoRI GAATTC MM Mp Ecory cararc MA E Hinan aaccrr MM Y esti ctecas Y Salt GTCGAC E Y Smar cccece MA Y xbat rcraca
60. function should ideally take this into account Doing so is however not straightforward as it increases the number of model parameters considerably It is therefore commonplace to either ignore this complication and assume sequences to be unrelated or to use heuristic corrections for shared ancestry The second challenge is to find the optimal alignment given a scoring function For pairs of sequences this can be done by dynamic programming algorithms but for more than three sequences this approach demands too much computer time and memory to be feasible A commonly used approach is therefore to do progressive alignment Feng and Doolittle 1987 where multiple alignments are built through the successive construction of pairwise alignments These algorithms provide a good compromise between time spent and the quality of the resulting alignment Presently the most exciting development in multiple alignment methodology is the construction of statistical alignment algorithms Hein 2001 Hein et al 2000 These algorithms employ a scoring function which incorporates the underlying phylogeny and use an explicit stochastic model of molecular evolution which makes it possible to compare different solutions in a statistically rigorous way The optimization step however still relies on dynamic programming and practical use of these algorithms thus awaits further developments Creative Commons License All CLC bio s scientific articles are license
61. generated by the inner primer pair and this is also the PCR fragment which can be exported 15 7 TaqMan CLC Gene Workbench allows the user to design primers and probes for TaqMan PCR applications TaqMan probes are oligonucleotides that contain a fluorescent reporter dye at the 5 end and a quenching dye at the 3 end Fluorescent molecules become excited when they are irradiated and usually emit light However in a TaqMan probe the energy form the fluorescent dye is transferred to the quencher dye by fluorescence resonance energy transfer as long as the quencher and the dye are located in close proximity i e when the probe is intact TaqMan probes are designed to anneal within a PCR product amplified by a standard PCR primer pair If a TaqMan probe is bound to a product template the replication of this will cause the Taq polymerase to encounter the probe Upon doing so the 5 exonuclease activity of the polymerase will cleave the probe This cleavage separates the quencher and the dye and as a result the reporter dye starts to emit fluorescence The TaqMan technology is used in Real Time quantitative PCR Since the accumulation of fluorescence mirrors the accumulation of PCR products it can can be monitored in real time and used to quantify the amount of template initially present in the buffer The technology is also used to detect genetic variation such as SNP s By designing a TaqMan probe which will specifically bind to one of tw
62. give a more general introduction to the concept of phylogeny and the associated bioinformatics methods 19 1 Inferring phylogenetic trees For a given set of aligned sequences see chapter 18 it is possible to infer their evolutionary relationships In CLC Gene Workbench 2 0 this is done by creating af phylogenetic tree Toolbox in the Menu Bar Alignments and Trees fs Create Tree T or right click alignment in Navigation Area Toolbox Alignments and Trees f Create Tree tc This opens the dialog displayed in figure 19 1 If an alignment was selected before choosing the Toolbox action this alignment is now listed in the Selected Elements window of the dialog Use the arrows to add or remove elements from the Navigation Area Click Next to adjust parameters 19 1 1 Phylogenetic tree parameters Figure 19 2 shows the parameters that can be set 256 CHAPTER 19 PHYLOGENETIC TREES 257 Y Create Tree 1 Select an alignment MEC a Projects Selected Elements LA Example data FEE protein alignment E E Nucleotide EE Protein E Extra Se Performed analyses a Gene Workbench S E Protein Workbench tei tree EZ CAA32220 hydr peel Pattern Discove amp NP _058652 BLA E README gt Figure 19 1 Creating a Tree Y Create Tree 1 Select an alignment MESAS 2 Set parameters Algorithm Neighbor Joining Y Bootstrapping V
63. import and export see chapter 6 Furthermore an element can be added to a project by dragging it into the Navigation Area Elements on lists e g search hits or sequence lists can also be dragged to the Navigation Area When dragging from the View Area to the Navigation Area the element e g a sequence an alignment or a search report is selected by clicking on the tab and dragging it into the navigation area If the element already exists you are asked whether you want to save a copy If a piece of data is dropped on a folder or a project the data is placed at the bottom of the list of elements in the folder or project in question If a piece of data is dropped on an element which is not a folder or a project the data is added just after that element 3 1 2 Create new projects and folders In the Navigation Area all files and folders are stored in one or more projects Creating a new project can be done in two ways CHAPTER 3 USER INTERFACE 60 right click an element in the Navigation Area New New Project or File New New Project Regardless of which element is selected when you create a new project the new project is placed at the bottom of the Project Tree You can move the project manually by selecting it and dragging it to the desired location Projects are always placed at the upper most level in the Project Tree In order to organize your files they can be placed in folders Creating a new fol
64. like embryonic chain transcript varia 2005 12 09 BCO69307 Homo sapiens hemoglobin delta mRNA cDNA clone MGC 96894 IMAG 2004 06 30 a Download and Open Download and Save 50 of 236 hits shown pen J a Figure 2 6 NCBI search view 2 3 1 Saving the search If you click Save search parameters the program does not save the search results but rather the search criteria This allows you to perform exactly the same search later on In this tutorial we are not certain of the quality of our search criteria and therefore we choose not to save them Consequently click Start search 44 to perform the search 2 3 2 Searching for matching objects When the search is complete the list of hits is shown If the desired complete human hemoglobin DNA sequence is found the sequence can be viewed by double clicking it in the list of hits from the search If the desired sequence is not shown you can click the More button below the list to see more hits 2 3 3 Saving the sequence The sequences which are found during the search can be displayed by double clicking in the list of hits However this does not save the sequence It is necessary to save the sequences before any analysis can be conducted A sequence is saved like this click the tab with the name of the sequence Save in the toolbar Eh or click the tab with the name of the sequence Ctrl S 5 S on Mac When you close the view of the sequence you a
65. multiple elements Ctrl Select multiple elements Shift Shift Shift Double click the tab of the View Double click the View title Click in view Click elements Click elements Chapter 4 User preferences Contents 4 1 General preferences 2 eee eee ee ee 77 4 2 Default View preferences 1 00 0 eee 2 77 4 3 Advanced preferences 0 00 se ee eee ee eee ee ee 78 4 4 Export import of preferences 0 00 e eee eee ee ee 78 4 5 View preference style sheet 0 0 0 eee eee ee ee 78 4541 Floating Side Panel ss ius Ponera a rec ok Pet a ae at a hs 79 The Preferences dialog offers opportunities for changing the default settings for different features of the program For example if you adjust Number of hits under General Preferences to 40 instead of 50 you see the first 40 hits each time you conduct a search e g NCBI search The Preferences dialog is opened in one of the following ways and can be seen in figure 4 1 Edit Preferences 73 or Ctrl K on Mac 9 Preferences Undo Limit 500 Number of hits 50 Style English United States wl Advanced of OK X Close Jf Export import Figure 4 1 Preferences include General preferences View preferences Colors preferences and Advanced settings 76 CHAPTER 4 USER PREFERENCES TT 4 1 General preferences The General preferences include e Undo Limit As
66. next time you conduct an NCBI search Notice When conducting a search no files are downloaded Instead the program produces a list of links to the files in the NCBI database This ensures a much faster search The search process runs in the Toolbox under the Processes tab It is possible to stop the search process by clicking stop m Because the process runs in the Processes tab it is possible to perform other tasks while the search is running 9 1 2 Handling of GenBank search results The search result is presented as a list of links to the files in the NCBI database The View displays 50 hits at a time can be changed in the Preferences see chapter 4 More hits can be displayed by clicking the More button at the bottom right of the View Each sequence hit is represented by text in three columns e Accession e Definition e Modification date It is possible to exclude one or more of these columns by adjust the View preferences for the database search view Furthermore your changes in the View preferences can be saved See section 4 5 Several sequences can be selected and by clicking the buttons in the bottom of the search view you can do the following e Download and open doesn t save the sequence e Download and save lets you choose location for saving sequence e Open at NCBI searches the sequence at NCBI s web page Double clicking a hit will download and open the sequence The hits can also be copied into
67. nse Par installing consin Figure 1 7 Read the License Agreement carefully Read the License Agreement carefully before clicking the I accept button In the next step shown in figure 1 8 click the Activate license on line button Your computer must be connected to the internet in order to activate the license Once the license is activated you may work CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 18 off line It will take a little time to activate the license key When the license key is activated CLC Gene Workbench 2 0 will start Get license Accept agreement Activate license Activate license The license must be activated before the application can be used The activation has to be done on line and therefore you need to be connected to the internet during the activation Activate license on line Tf you are unable to activate the license on line please contact support clcbio com and include the following information in your email License Number Activation Key Copy this information to clipboard Proxy settings Import anew license Figure 1 8 Activate the license key online A license is related to a specific computer and therefore it can be used by anyone using that computer If at some time you want to transfer the license to another computer please contact license clcbio com Problems with online activation If you have problems activating th
68. of bioinformatic data Here follows a short list of the formats which CLC Gene Workbench 2 0 handles and a description of which type of data the different formats support File type Suffix File format used for Phylip Alignment phy alignments GCG Alignment msf alignments Clustal Alignment aln alignments Newick nwk trees FASTA fsa fasta sequences GenBank gbk gb gp Sequences GCG sequence gcg sequences only import PIR NBRF pir sequences only import Staden sdn sequences only import VectorNTI sequences only import DNAstrider str strider sequences Swiss Prot Swp protein sequences Lasergene sequence pro protein sequence only import Lasergene sequence seq nucleotide sequence only import Embl embl nucleotide sequences Nexus nxs nexus sequences trees alignments and sequence lists CLC clc sequences trees alignments reports etc Text txt all data in a textual format ABI Trace files only import AB1 Trace files only import SCF2 Trace files only import SCF3 Trace files only import Phred Trace files only import mmCIF Cif structure only import PDB pdb structure only import Preferences cpf CLC workbench preferences Notice that CLC Gene Workbench can import external files too This means that CLC Gene Workbench can import all files and display them in the Navigation Area while the above mentioned formats are the types which can be read by CLC Gene Workbench
69. on your computer Local BLAST and the creation of a database for local BLAST search is described later in this chapter 10 1 BLAST Against NCBI Database To conduct a BLAST search right click the tab of an open sequence Toolbox BLAST Search BLAST Against NCBI Databases or click an element in the Navigation Area Toolbox BLAST Search BLAST Against NCBI Databases Alternatively use the keyboard shortcut Ctrl Shift B for Windows and 38 Shift B on Mac OS This opens the BLAST dialog You can not use sequences longer than 8190 for BLAST search This opens the dialog seen in figure 10 1 Click Next In Step 2 you can choose which type of BLAST search you want to conduct and you can limit your 107 CHAPTER 10 BLAST SEARCH 108 9 BLAST Against NCBI Databases 1 Select sequences of same ect se m Projects Selected Elements S L Example data Ne Q6WNZO 4 7 Nucleotide Es Protein E 3D structures Sequences iZ sequence list Ns he Q6WNZ1 Q6WNZZ fj Extra a f performed anaes E README CLC bio Home sil blast database H Alignments Figure 10 1 Choose one or more sequences to conduct a BLAST search search to a particular database see section B in the appendix for a list of available databases Step 2 can be seen in figure 10 2 9 BLAST Against NCBI Databases 1 Select sequences of same BE type 2 Set program parameters Choose Program and Database Program
70. one Sequence one axis match to the other sequence a dot is drawn at the plot Dot plots are one of the oldest methods for comparing two sequences Maizel and Lenk 1981 The scores that are drawn on the plot are affected by several issues e Scoring matrix for distance correction Scoring matrices BLOSUM and PAM contain substitution scores for every combination of two amino acids Thus these matrices can only be used for dot plots of protein sequences e Window size The single residue comparison bit by bit comparison window size 1 in dot plots will CHAPTER 12 GENERAL SEQUENCE ANALYSES 140 undoubtedly result in a noisy background of the plot You can imagine that there are many successes in the comparison if you only have four possible residues like in nucleotide sequences Therefore you can set a window size which is smoothing the dot plot Instead of comparing single residues it compares subsequences of length set as window size The score is now calculated with respect to aligning the subsequences e Threshold The dot plot shows the calculated scores with colored threshold Hence you can better recognize the most important similarities Examples and interpretations of dot plots Contrary to simple sequence alignments dot plots can be a very useful tool for spotting various evolutionary events which may have happened to the sequences of interest Below is shown some examples of dot plots where sequence insertions l
71. primer on the sequence simply left click the circle using the mouse The various primer parameters can now be varied to explore their effect and the view area will dynamically update to reflect this If e g the allowed melting temperature interval is widened more green circles will appear indicating that more primers now fulfill the set requirements and if e g a requirement for 3 G C content is selected rec circles will appear at the starting points of the primers which fail to meet this requirement 15 3 2 Detailed information mode In this mode a very detailed account is given of the properties of all the available primers When a region is chosen primer information will appear in groups of lines beneath it see figure 15 5 11 AY738615 eters Forward primer re y Primer information Show ct TmL 18 GIC content G C Z Melting temp Tm TmL 19 TmL 20 Secondary structure ss O end gic OS end eje TmL 21 Sequence layout po ae ie ee EEN Annotation layout ES gt Restriction sites Figure 15 5 Detailed information mode The number of information line groups reflects the chosen length interval for primers and probes One group is shown for every possible primer length Within each group a line is shown for every primer property that is selected from the checkboxes in the primer information preference group Primer properties are shown at each potential primer
72. procedure for searching is identical for all four search options see also figure 9 3 Right click a sequence in the Navigation Area Sequence Web Info select the desired search function e 206 HUMOS Show Ctro i seque Show gt i Assembly New gt ES Cloning pr iaa Toolbox gt Primer de ES Restrictiot Web info gt Google sequence Protein off Cut ctrl x NCBI Extra PubMed References Performed an Gh Copy cite tn README fa Paste Ctrl Y U 5 Delete Delete Rename F2 Sequence Representation b Properties Figure 9 3 By right clicking a search result it is possible to choose how to handle the relevant sequence This will open your computer s default browser searching for the sequence that you selected 9 2 1 Google sequence The Google search function uses the accession number of the sequence which is used as search term on http www google com The resulting web page is equivalent to typing the accession number of the sequence into the search field on http www google com 9 2 2 NCBI The NCBI search function searches in GenBank at NCBI http www ncbi nlm nih gov using an identification number when you view the sequence as text it is the GI number Therefor the sequence file must contain this number in order to look it up in NCBI All sequences downloaded from NCBI have this number 9 2 3 PubMed References The PubMed references search option lets you look up Pubmed articles based on references contai
73. protein sequence drag it into the Navigation Area or press ctrl S S on Mac to activate a save dialog 12 3 Local complexity plot In CLC Gene Workbench it is possible to calculate local complexity for both DNA and protein sequences The local complexity is a measure of the diversity in the composition of amino acids within a given range window of the sequence The K2 algorithm is used for calculating local complexity Wootton and Federhen 1993 To conduct a complexity calculation do the following Select sequences in Navigation Area Toolbox in Menu Bar General Sequence Analyses A Create Complexity Plot Lz This opens a dialog In Step 1 you can change remove and add DNA and protein sequences When the relevant sequences are selected clicking Next takes you to Step 2 This step allows you to adjust the window size from which the complexity plot is calculated Default is set to 11 amino acids and the number should always be odd The higher the number the less volatile the graph Figure 12 13 shows an example of af local complexity plot Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish The values of the complexity plot approaches 1 0 as the distribution of amino acids become more complex CHAPTER 12 GENERAL SEQUENCE ANALYSES 148 le CAA24102 comp Graph Set ings Complexity plot of CAA24102 r gt a TR Graph preferenc
74. required for a match How many nucleotides of the primer that must base pair to the sequence in order to cause mispriming e Number of consecutive base pairs required in 3 end How many consecutive 3 end base pairs in the primer that MUST be present to for mispriming to occur This option is included since 3 terminal base pairs are known to be essential for priming to occur Notice that including a search for potential mispriming sites will prolong the search time substantially if long sequences are used as template and if the minimum number of base pairs required for a match is low If the region to be amplified is part of a very long molecule and mispriming is a concern consider extracting part of the sequence prior to designing primers If both a forward and a reverse region are defined primer pairs will be suggested by the algorithm After pressing the calculate button a dialogue will appear see figure 15 8 Y Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum GC content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Primer combination parameters Max percentage point difference in G C content Max difference in melting temperatures within a primer pair o 5 a Max pair annealing score 8 gt
75. restriction sites from the Toolbox or when viewing another sequence 11 6 2 Using split views to see details of the circular molecule In order to see the nucleotides of a circular molecule you can open a new view displaying a circular view of the molecule right click the tab of the circular view of the sequence Show Sequence This will open a linear view of the sequence below the circular view When you zoom in on the linear view you can see the residues as shown in figure 11 12 O 4F134224 AF134224 171 bp gt AF134224 AF134224GCAGGT TAGTATGGAG GATAGAAGCA GOTTAAGGAG AG lt gt Figure 11 12 Two views showing the same sequence The bottom view is zoomed in Notice If you make a selection in one of the views the other view will also make the corresponding selection providing an easy way for you to focus on the same region in both views 11 6 3 Mark molecule as circular and specify starting point You can mark a DNA molecule as circular by right clicking its label in either the sequence view or the circular view In the right click menu you can also make a circular molecule linear A circular molecule displayed in the normal sequence view will have the sequence ends marked with a CHAPTER 11 VIEWING AND EDITING SEQUENCES 135 The starting point of a circular sequence can be changed by make a selection starting at the position that you want to be the new starting point right click the select
76. sequence onoono 2 ee eee ees 123 114 3 Editing The SEQUENCE ce dade a Re a Be a 124 11 1 4 Adding and modifying annotations o a saoao a 124 11 1 5 Removing annotations 4 126 11 16 Sequence region WES 0502 ke ea a ee ee ee 126 11 2 Sequence information 1 6 ee 2 2 127 1 2a AMO ION MAD e esa a aa de aa e D A 128 MEL S ViGWiaS tet oe oe eos ee A eS ee AAA AAA 128 11 4 Creating anew Sequence ee 2 2 129 11 5Sequence Lists c co ao oono e a lt lt sms a a E 130 11 5 1 Graphical view of sequence lists o 131 11 5 2 Sequence list tahle coros ses a we a aa 132 11 5 3 Erat SEQUENCES 4 522524243858 Ei REEDS a e 132 TA 6 Circular DNA a ic ee A we me a A e ee 132 11 6 1 Show restriction sites for circular DNA o 133 11 6 2 Using split views to see details of the circular molecule 134 11 6 3 Mark molecule as circular and specify starting point 134 CLC Gene Workbench 2 0 offers three different ways of viewing and editing sequences as described in this chapter Furthermore this chapter also explains how to create a new sequence and how to assemble several sequences in a sequence list 11 1 View sequence When you double click a sequence in the Navigation Area the sequence will open automatically and you will see the nucleotides or ami
77. starting position and are of two types Properties with numerical values are represented by bar plots A green bar represents the starting point of a primer that meets the set requirement and a red bar represents the starting point of a primer that fails to meet the set requirement G C content Melting temperature e Self annealing score e Self end annealing score Secondary structure score Properties with Yes No values If a primer meets the set requirement a green circle will be shown at its starting position and if it fails to meet the requirement a red dot is shown at its starting position e C G at 3 end CHAPTER 15 PRIMERS 191 e C G at 5 end Common to both sorts of properties is that mouse clicking an information point filled circle or bar will cause the region covered by the associated primer to be selected on the sequence 15 4 Output from primer design The output generated by the primer design algorithm is a table of proposed primers or primer pairs with the accompanying information see figure 15 6 EZ PERH3BC primers o Standard primers for sequence PERH3BC Number of rows 41 Penalty Pair annealing align F1 R1 Sequence F1 Sequence R1 CCATGGTITECTTCCTCT 125 99 trou CCATGGTTTCCTTCCTCT CCAMACTCTTGTCAGCAC CAAACTCTTGTCAGCACT annealing F1 R1 132 73 Winona CCATGGTTTCCTTCCTCTA CAAACTCTTGTCAGCACT ATCTCCTICCTITGGTACC Fragment length F1 R1 Sequence F1 CCATGGTTTCCTICCICT 132 9
78. temperature R1 3 Figure 2 29 A list of primers To the right are the Side Panel showing the available choices of information to display 2 10 Tutorial Assembly In this tutorial you will see how to assemble data from automated sequencers into a contig and how to find and inspect any inconsistencies that may exist between different reads First select the five trace files the reads in the Assembly folder in the Nucleotide folder of the Example data To assemble the files Toolbox in the Menu Bar Assembly 3 Assemble Sequences Click Next to go to the second step of the assembly where you choose to trim the sequences In the next step you will be able to specify how this trimming should be performed see figure 2 31 Leave these settings at their default and click Finish 2 10 1 Getting an overview of the contig The result of the assembly is a Contig which is an alignment of the five reads Click Fit width to see an overview of the contig To help you determine the coverage display a coverage graph see figure 2 32 Alignment info in Side Panel Coverage Graph This overview can be an aid in determining whether coverage is satisfactory and if not which regions a new sequencing effort should focus on Next we go into the details of the contig 2 10 2 Finding and editing inconsistencies Click Zoom to 100 lt 4 to zoom in on the residues at the beginning of the contig Click the Find
79. that CLC bio may provide to You or make available to You after the date You obtain Your initial copy of the Software Product to the extent that such items are nnt accomnanied hsr a senarate license asreement or terms of nse yr installing comwine Figure 1 4 License Agreement Please read the License agreement carefully before clicking I accept In the next step shown in figure 1 5 select Activate license on line Again you might have to wait for a short while because the license key is being activated on our server A license is related to a specific computer and therefore it can be used by anyone using that computer Like in figure 1 3 you can specify a proxy server if needed Get license Accept agreement Activate license Activate license The license must be activated before the application can be used The activation has to be done on line and therefore you need to be connected to the internet during the activation Activate license on line Tf you are unable to activate the license on line please contact support clcbio com and include the Following information in your email License Number Activation Key Copy this information to clipboard Proxy settings Import anew license Figure 1 5 Activate the license key online Now the license key is activated on your computer and CLC Gene Workbench 2 0 starts Problems with online activation CHAPTER 1
80. the better the solution Region the interval of the template sequence covered by the primer Self annealing the maximum self annealing score of the primer in units of hydrogen bonds Self annealing alignment a visualization of the highest maximum scoring self annealing alignment Self end annealing the maximum score of consecutive end base pairings allowed between the ends of two copies of the same molecule in units of hydrogen bonds GC content the fraction of G and C nucleotides in the primer Melting temperature of the primer template complex Secondary structure score the score of the optimal secondary DNA structure found for the primer Secondary structures are scored by adding the number of hydrogen bonds in the structure and 2 extra hydrogen bonds are added for each stacking base pair in the structure Secondary structure a visualization of the optimal DNA structure found for the primer If both a forward and a reverse region are selected a table of primer pairs is shown where the above columns excluding the penalty are represented twice once for the forward primer designated by the letter F and once for the reverse primer designated by the letter R Before these and following the penalty of the primer pair are the following columns pertaining to primer pair information available CHAPTER 15 PRIMERS 195 e Pair annealing the number of hydrogen bonds found in the optimal alignment of the forward and the re
81. the following ways If you have downloaded an installer Locate the downloaded installer and double click the icon The default location for downloaded files is your desktop If you are installing from a CD Insert the CD into your CD ROM drive and open it by double clicking on the CD icon on your desktop Launch the installer by double clicking on the CLC Gene Workbench icon Installing the program is done in the following steps e On the welcome screen click Next e Read and accept the License agreement and click Next e Choose where you would like to install the application and click Next e Choose whether you would like to create desktop icon for launching CLC Gene Workbench and click Next Wait for the installation process to complete choose whether you would like to launch CLC Gene Workbench right away and click Finish When the installation is complete the program can be launched from your Applications folder or from the desktop shortcut you choose to create If you like you can drag the application icon to the dock for easy access 1 2 4 Installation on Linux with an installer Navigate to the directory containing the installer and execute it This can be done by running a command similar to sh CLCGeneWorkbench_1_0_2 JRE sh sh If you are installing from a CD the installers are located in the linux directory Installing the program is done in the following steps CHAPTER 1 INTRODUCTION TO CLC GENE WORKB
82. while being connected to the internet or by sending an email to license clcbio com If you experience any problems please contact support clcbio com Request evaluation license Import a license key file Figure 1 6 Select Import a license key file Choose the option Import a license key file in order to specify where your license key is located Select the license key file provided by CLC bio When you have selected this file the License Agreement is shown see figure 1 7 If you want to use another license key instead click the Import a license key file button Get license Accept agreement Activate license END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE aj CLC Gene Workbench 2 0 3 1 Recitals 1 1 This End User License Agreement EULA is a legal agreement between you either an individual person or a single legal entity who will be referred to in this EULA as You and CLC bio AIS CVR no 28 30 50 87 for the software products that accompanies this EULA including any associated medi printed materials and electronic documentation the Software Product 1 2 The Software Product also includes any software updates add on components web services andlor supplements that CLC bio may provide to You or make available to You after the date You obtain Your initial copy of the Software Product to the extent that such items are not accamnanied hic senarate license aoreement or terms of
83. 13 APR 2005 18 APR 2005 osea pam 1 2 Counts of amino acids a LN ETT E E Figure 12 15 Comparative sequence statistics Weight Isoelectric point Aliphatic index Half life Extinction coefficient Counts of Atoms Frequency of Atoms Count of hydrophobic and hydrophilic residues Frequencies of hydrophobic and hydrophilic residues Count of charged residues Frequencies of charged residues Amino acid distribution Histogram of amino acid distribution Annotation table CHAPTER 12 GENERAL SEQUENCE ANALYSES 152 e Counts of di peptides e Frequency of di peptides The output of nucleotide sequence statistics include e General statistics Sequence type Length Organism Locus Description Modification Date Weight e Atomic composition Nucleotide distribution table Nucleotide distribution histogram e Annotation table e Counts of di nucleotides e Frequency of di nucleotides A short description of the different areas of the statistical output is given in section 12 4 2 12 4 1 Sequence statistics output The entire statistical output can be printed To do so click the Print icon amp 12 4 2 Bioinformatics explained Protein statistics Every protein holds specific and individual features which are unique to that particular protein Features such as isoelectric point or amino acid composition can reveal important information of a novel protein Many of the feature
84. 3 7 History 95 Tel Element hISloN 2 sce oss sce By ek we e o ek we wre Be a ed 95 8 Handling of results 97 8 1 Howto handle results of analyses ee ee 97 CONTENTS 5 Ill Bioinformatics 100 9 Database search 101 9 1 GenBank SCAN oa ae eR ee Re Rd eee a 101 22 SEQUENCE WER INIO a micas oa Bou weds a eee Bee RO ae hw ee we a 104 10 BLAST Search 107 TOA BLAST Against NCBI Database xr Re ee ee ee Me a 107 10 2 BLAST Apainst Local Database i sio os See OR en eo dw ed a 113 10 3 Create Local BLAST Database ociosa 68 440 o bbw ew ea wR ee 114 11 Viewing and editing sequences 117 Le WCW SCQUBNCCr e a ak a wie So ah em Ge So Grek eh we SS tw ok 117 11 2 Sequence information es 127 AL SS VIRWESMORU aa avn he Bk Rig BE ee ie A eRe BO eee ee BR Rn Be 128 11 A LCreating a new SEQUENCE 4 aos gou ani eae Ye aoe MA ee ad ee ES a 129 11 5 Seguente LISTS as ee OE ee ewe we le eS He a 130 116 Crculan DNA 8 ob e ek bk ele ae hee we hoe pe Be we 132 12 General sequence analyses 136 2 T DOU PlOtS xe is ie we a we ae Se wh a ee a he we a 136 12 2 SMUINS Sequence sos dc Bowe Gere a aes Be pe Ara ace ee Ew arrasa 146 12 3 Local complexity DIO wa s aw wae a ek ee Oe ew ww ee we 147 124 Sequence STAUIGUICS masia a bd bo ae te ee a he ee ae 149 12 5 Join SEQUENCES e iao rce wo we ee da a a ee ee ba ew eee a 156 2G MOU SEAS Gere oa E EE aoe we ee Pa ee eee Ge we ee E eee ee 157 12 Pattern DISCOVERY s oaoa soare ee ea daa a e
85. 3 NUCLEOTIDE ANALYSES 167 9 Translate to Protein 1 Select nucleotide Seas sequences 2 ee es eee Translation of whole sequence V Reading frame 1 Reading frame 2 Reading frame 3 Reading frame 1 Reading frame 2 Reading frame 3 Translation of coding regions only Genetic code translation table g 1 Standard L JLSs Previous J Pnet Y Frish YK Cancel Figure 13 5 Choosing 1 and 3 reading frames and the standard translation table 13 4 1 Translate part of a nucleotide sequence If you want to make separate translations of all the coding regions of a nucleotide sequence you can check the option Translate CDS and ORF in the translation dialog see figure 13 5 If you want to translate a specific coding region which is annotated on the sequence use the following procedure Open the nucleotide sequence right click the ORF or CDS annotation Translate CDS ORF choose a translation table OK 2 If the annotation contains information about the translation this information will be used and you do not have to specify a translation table The CDS and ORF annotations are colored yellow as default 13 5 Annotate with SNPs CLC Gene Workbench 2 0 can annotate sequences with Single Nucleotide Polymorphism SNP as found in the NCBI online dbSNP database A SNP is a mutation of a single nucleotide in a DNA sequence To annotate with SNPs
86. 321 1MSTICAMAM KIESBK SQ SSBPPMNHBR Possess MMPABARAMO MSKEMECPH HSRRIHRROM ARTEPER sQ STRPPMBHBR pize7s MNPTETRAMP MSKQBECPHS PNEKREHEROS METEPERESO STRPSMMHBR P20811 MMPIMANAM RTEPBRKPQ SSKPSMMHBR Q95208 MNPTBAKAMP CSKQMECPHS PNENKRIHKEKO METE ANASA sTHPSMMHER Figure 18 3 The first 50 positions of two different alignments of seven calpastatin sequences The top alignment is made with cheap end gaps while the bottom alignment is made with end gaps having the same price as any other gaps In this case it seems that the latter scoring scheme gives the best result NM_173881_CDS 1 NM_000559 1 NM_173881_CDS 1 NM_000559 1 Figure 18 4 The alignment of the coding sequence of bovine myoglobin with the full mRNA of human gamma globin The top alignment is made with free end gaps while the bottom alignment is made with end gaps treated as any other The yellow annotation is the coding sequence in both sequences It is evident that free end gaps are ideal in this situation as the start codons are aligned correctly in the top alignment Treating end gaps as any other gaps in the case of aligning distant homologs where one sequence is partial leads to a spreading out of the short sequence as in the bottom alignment For a comprehensive explanation of the alignment algorithms see section 18 5 18 1 3 Aligning alignments If you have selected an existing alignment in the first
87. 4 Name Common name Species Type SY Dra MD ORNA Ag O Protein J Circular Description Keywords Comments Sequence required 0 Figure 2 44 Pasting a sequence into the text field at the bottom is a quick way of importing sequence data This dialog lets you paste all kinds of characters into the text field including numbers and spaces If you have pasted e g numbers into the field just press and hold the space key on your keyboard until the numbers have been deleted Spaces are not included in the new sequence 2 11 12 Perform analyses on many elements If you have a folder with a lot of mixed elements e g both nucleotide and protein sequences alignments reports you can often select the whole folder for an analysis even if the analysis should only be performed on a special type of element e g nucleotide sequences In the example below figure 2 45 the dialog says Select nucleotide sequences but the project contains both protein and nucleotide sequences Instead of carefully pinpointing the nucleotide CHAPTER 2 TUTORIALS 54 sequences you can just press Ctrl A A on Mac selecting all the visible elements When you add these elements gt the protein sequences are filtered out Projects Selected Elements SEAT A doc av738615 90 NM_oo00044 eg 733615 2 HUMDINUC bs BHUMDINLIC AX PERH2BD OC PERH3BC 3 sequ
88. 5 AACCTCAAGGGCACTTTTTCTCAGCTGAG Lgt 18 Min 52 Self annealing Lgt 19 Max 18 Self end annealing Lgt 20 Max es Secondary structure Lgt 21 Max 16 gt 3 end G C restrictions Lgt 22 gt 5 end G C restrictions Mode 60 80 Standard PCR l 1 AY738615 TGAGCTGCACTGTGACAAGCTGCACGTGG Oa Lgt 18 O Nested PCR Lgt 19 Sequencing Lat 20 vi M Figure 15 1 The initial view of the sequence used for primer design 15 1 1 General concept The concept of the primer view is that the user first chooses the desired reaction type for the session in the Primer Parameters preference group e g Standard PCR Reflecting the choice of reaction type it is now possibly to select one or more regions on the sequence and to use the right click mouse menu to designate these as primer or probe regions see figure 15 2 When a region is chosen graphical information about the properties of all possible primers in this region will appear in lines beneath it By default information is showed using a compact CHAPTER 15 PRIMERS 186 Forward primer region here Reverse primer region here No primers here Copy Selection Expand Selection Open Selection in New View Edit Selection Delete Selection Add Annotation Add Enzymes Cutting The Selection To Panel Insert Restriction Site After Selection Insert Restriction Site Before Selection Trim Sequence Left Trim Sequence Right Set Alignment Fixpoint Here Set Numbers Relative to This
89. 52 Self annealing Max 4 8 al Self end annealing Max eel Secondary structure Max 16 3 end G C restrictions 5 end G C restrictions Mode Standard PCR Taqman Primer solution Perfect match O Allow degeneracy Max 25 y Figure 15 12 The initial view of an alignment used for primer design Perfect match Allow degeneracy Allow mismatches The work flow when designing alignment based primers and probes is as follows Use selection boxes to specify groups of included and excluded sequences Mark either a single forward primer region a single revers primer region or both on the sequence and perhaps also a TaqMan region Selections must cover all sequences in the included group Adjust parameters regarding single primers in the preference panel Push the Calculate button 15 9 2 Alignment based design of PCR primers In this mode a single or a pair of PCR primers are designed CLC Gene Workbench allows the user to design primers which will specifically amplify a group of included sequences but not amplify the remainder of the sequences the excluded sequences The selection boxes are used to indicate the status of a sequence if the box is checked the sequence belongs to the included sequences if not it belongs to the excluded sequences To design primers that are general for all primers in an alignment simply add them all to the set of included sequ
90. 9 23 14 45 CET 2005 instead of the version in the CLC Workbench Size 45555Sbytes Modified Tue Nov 08 21 10 51 CET 2005 Figure 6 3 A dialog asking which version of the file you want to keep 6 3 Export graphics to files CLC Gene Workbench 2 0 supports export of graphics into a number of formats This way the visible output of your work can easily be saved and used in presentations reports etc The Export Graphics function E is found in the Toolbar CLC Gene Workbench 2 0 exports graphics exactly the way it is shown in the View Area Thus all settings made in the Side Panel will be reflected in the exported file To show you how to export graphics we choose to export the phylogenetic tree of the example data set in png format See 6 4 When the relevant file is opened and shown in the View Area do the following select tab of View Graphics on Toolbar select location on disc name file and select type Save After clicking Save you are prompted for whether to Export visible area or Export whole view The first parameter exports what you see and the latter parameter also exports the part of the view that is not visible Hence choosing Export whole view will generate a larger file Furthermore when saving in png jog and tif formats you are prompted for which quality to save the graphics in CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 92 9 Export Graphics Save in Desktop a My Documents
91. 9 pana CCATGGTTTCCTTCCTCT CAAACTCTTGTCAGCACTC TD Region F1 cTcAccacteTTcTCAAAC I Sef enneaing Ft CCAAACTCTTGTCAGCAC O Self annealing alignment F1 133 16 DN CCATGGTTTCCTTCCTCTA CCAAACTCTTGTCAGCAC ATCTOCTICCITIGGTACC O Self end annealing F1 GC content F1 CCATGGTTTCCTTCCTCT 133 23 proa ICCATGGTTICCTTCCTCT CAAACTCTTGTCAGCACT Mett temp F1 TCACGACTGTTCTCAAAC Secondary structure score FL Y Figure 15 6 Proposed primers In the preference panel of the table it is possible to customize which columns are shown in the table See the sections below on the different reaction types for a description of the available information The columns in the output table can be sorted by the present information For example the user can choose to sort the available primers by their penalty value default or by their self annealing score simply by right clicking the column header The output table interacts with the accompanying primer editor such that when a proposed combination of primers and probes is selected in the table the primers and probes in this solution are highlighted on the sequence 15 4 1 Saving primers Primer solutions in a table row can be saved by selecting the row and using the right click mouse menu This opens a dialogue that allows the user to save the primers to the desired location Primers and probes are saved as DNA sequences in the program This means that all available DNA analyzes can be performed on the
92. A A BaculoDirect Linear DNA 20 BaculoDirect Linear DNA Clonir Jc BPV1 AL BRAF H206 CDK2 Ot CalF1 Figure 6 2 Project Vector NTI Data containing all imported sequences of the Vector NTI Database 6 1 2 Export of bioinformatic data CLC Gene Workbench 2 0 can export bioinformatic data in most of the formats that can be imported There are a few exceptions See section 6 1 1 To export a file select the element to export Export ES choose where to export to select File of type enter name of file Save Notice The Export dialog decides which types of files you are allowed to export into depending on what type of data you want to export E g protein sequences can be exported into GenBank Fasta Swiss Prot and CLC formats Export of projects folders and multiple files The clc file type can be used to export all kinds of files and is therefore especially useful in these situations CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 89 e Export of one or more file folders including all underlying files and folders e Export of one or more project folders including all underlying files and folders e f you want to export two or more files into one clc file you have to copy them into a folder or project which can be exported as described below Export of projects and folders is similar to export of single files Exporting multiple files of different formats is done in clc format This is how you export a proje
93. A TTTAGAGTTT MGnGGEEE ENE TA AN AJ871593 C TCAAACAGAC CCATGG AY310318 C TCAAACAGAC lt Figure 2 42 Using the split views and follow selection functionalities 2 11 9 Smart selecting in sequences and alignments There are a number of ways to select residues in Sequences and alignments Using the mouse This is the most basic way of selecting Place the mouse cursor where you want the selection to start press and hold the mouse button move the mouse to the location where the selection should end and release the mouse button Using the mouse in combination with the Shift key If you have made a selection and want to extend or reduce the selection hold the Shift key while clicking the location where you want the boundary of the selection Using the arrow keys in combination with the Shift key If you have made a selection and want to extend or reduce the selection hold the Shift key while pressing the left and right arrow keys Using the mouse in combination with the Ctrl for Windows or 3 for Mac key By holding this key you can make multiple selections that are not contiguous Selecting an annotation Double click an annotation in order to select the residues that the annotation covers This is especially helpful if the annotation is not contiguous as the CDS region in figure 2 39 Using the Search function At the bottom of Side Panel to the right there is a search field which can be used for selections use Ctrl
94. AATCAAAATG GAATAAAATC ATGCTACCAT CTATTTCAAT dsdxX ATCACAGGGG AAGGTGAGAT ATGCACTCTC AAATCTGGGT sunB ACATCCAGTG AGAGAGACCG ATGCATCCGA TGCTGAACAT Consensus AATTTAAAGG AGAATTACCT ATGAACGCAA TAATAAACAT Sequence Logo x3 KEG Reka reet hsa Fart ofl ces lloro le aael Figure 18 8 Ungapped sequence alignment of eleven E coli sequences defining a start codon The start codons start at position 1 Below the alignment is shown the corresponding sequence logo As seen a GTG start codon and the usual ATG start codons are present in the alignment This can also be visualized in the logo at position 1 Calculation of sequence logos A comprehensive walk through of the calculation of the information content in sequence logos is beyond the scope of this document but can be found in the original paper by Schneider and Stephens 1990 Nevertheless the conservation of every position is defined as Rsey which is the difference between the maximal entropy Smar and the observed entropy for the residue distribution Sobs N Rseq Smag Sobs logs 5 Pn logs pa n 1 Pn is the observed frequency of an amino acid residue or nucleotide of symbol n at a particular position and N is the number of distinct symbols for the sequence alphabet either 20 for proteins or four for DNA RNA This means that the maximal sequence information content per position is log 4 2 bits for DNA RNA and log 20 4 32 bits for proteins The o
95. BRS226AGCAT CCTCTCTCGT TTCATCGS Lgt 18 Lgt 19 Lgt 20 the highlighted region pBR32286AGCAT Let 18 Lgt 19 Lgt 20 L gt 21 Lgt 22 Figure 2 25 The first dot on line one represents the starting point of a primer that will anneal to ooo oecococococe oocooccoooo oo e e o Primer covering positions 1851 to 1868 e e e Fraction of G and C 0 56 Self annealing 16 Self end annealing 6 Secondary structure 13 requirement not met Melting temperature 58 45 Figure 2 26 Clicking the dot will select the corresponding region and placing the cursor upon the dot will reveal an information box This indicates that the primer represented by this dot does not meet the requirements set in the Primer parameters see figure 2 27 v Primer parameters Length Max 28 Min 18 GC content 9 Max 60 Min 408 gt Melt temp C Max 58 Min 485 Inner Melt temp C Max Min Self annealing Max 18 Self end annealing Max es Secondary structure Max 16 3 end G C restrictions C S end G C restrictions Mode Standard PCR O Nested PCR Sequencing O TaqMan Calculate EDE Figure 2 27 The Primer parameters Note that the maximum annealing temperature is per default set to 58 and this is the reason why the primer in figure 2 26 with an annealing temperature of 58 45 does not me
96. CAA26204 1OIN D ChainD D 6 47842E 66 621 0 2 125 120 119 2 2 1 120 120 1 1 CAA26204 1Y83 D Chain 6 47842E 66 621 0 125 1 120 119 1 1 120 120 120 119 120 119 125 125 C az6204 1YWT Chain B T 8 46108E 66 620 0 CAA26204 1HDB D Chain D A 8 46108E 66 620 0 Download and Open_ Download and Save _open at ncet_ openstructure Figure 10 5 Display of the output of a BLAST search in the tabular view The hits can be sorted by the different columns simply by clicking the column heading The BLAST Table includes the following information e Query sequence The sequence which was used for the search e Hit The Name of the sequences found in the BLAST search Description Text from NCBI describing the sequence e E value Measure of quality of the match Higher E values indicate that BLAST found a less homologous sequence e Score This shows the bit score of the local alignment generated through the BLAST search Hit start Shows the start position in the hit sequence CHAPTER 10 BLAST SEARCH 113 e Hit end Shows the end position in the hit sequence e Query start Shows the start position in the query sequence e Query end Shows the end position in the query sequence e Identity Shows the number of identical residues in the query and hit sequence In the BLAST table view you can handle the hit sequences Select one or more sequences from the table and apply one o
97. CCATGGTTTCCTTCCTCT Figure 15 18 A primer order for 4 primers Chapter 16 Assembly Contents 16 1 Importing and viewing trace data 208 16 2 Trim sequences o 209 16 2 1 Manual MMS 6 6 eee a a el Oe EE 2 Se Ss 210 16 2 2 AUtomatic THIMMING a s 22 04 cc eee ee ed aoda Ee Re i koua i 210 16 3 Assemble sequences 0 00 2 eee ee es 212 16 4 Assemble to reference sequence 0 0 eee ee eee 213 16 5 Assemble to an existing contig lt 216 16 6 View and edit contigs lt lt lt lt 216 16 6 1 Editing and zooming the contig o ee 218 16 6 2 Output from the contig lt i s so saca moa a aUi k oa a dsa EOE 4 219 16 6 3 Assembly variance table o b a s a 219 CLC Gene Workbench 2 0 lets you import trim and assemble DNA sequence reads from automated sequencing machines A number of different formats are supported see section 6 1 1 This chapter first explains how to trim Sequence reads Next follows a description of how to assemble reads into contigs both with and without a reference sequence In the final section the options for viewing and editing contigs are explained 16 1 Importing and viewing trace data A number of different binary trace data formats can be imported into the program including Standard Chr
98. D the rpm packages are located in the RPMS directory Installation of RPM packages usually requires root privileges When the installation process is finished the program can be executed by running the command clcgenewb 1 3 System requirements The system requirements of CLC Gene Workbench 2 0 are these e Windows 2000 or Windows XP e Mac OS X 10 3 or newer e Linux Redhat or SuSE CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 15 e 256 MB RAM required e 512 MB RAM recommended e 1024 x 768 display recommended 1 4 Licenses The license system of CLC Gene Workbench 2 0 is based on a license key which is unique for the computer rather than for the user of the workbench 1 4 1 Demo license description We offer a fully functional demo version of CLC Gene Workbench 2 0 to all users free of charge Each user is entitled to four weeks demo of CLC Gene Workbench 2 0 In order to make your demo time as valuable as possible the four weeks can be separated You can e g try two weeks of the demo in January and the next two weeks in March To prevent unauthorized use of the program you must be connected to the Internet while starting up a demo version of CLC Gene Workbench An additional online check will be conducted 24 hours after the launch of the workbench After running CLC Gene Workbench 2 0 for 24 hours if you are not connected to the Internet you will be met with the dialog shown in figure 1 2 9 On line ve
99. E Y xhot crea Select all Deselect all Edit enzymes Save enzymes Load enzymes Figure 17 1 Two sequences in the cloning view If you during the virtual cloning encounter that you need additional sequences you can easily add more sequences to the view Just right click anywhere on the empty white area Add Sequences 17 2 1 Introduction to the cloning view The cloning view operates with a linear representation of the sequences even though they might be circular Circular sequence are represented with a small lt lt and gt gt at the ends of each sequence When you have finished designing your cloning Sequence you can open it in a circular view See section 17 2 6 In the cloning view most of the basic options for viewing selecting and zooming the sequences are the same as for the standard sequence viewer See section 11 1 for an explanation of these options This means that features such as e g known SNP s exons and other annotations can be displayed on the sequences to guide the choice of regions to clone CHAPTER 17 CLONING AND CUTTING 223 However the cloning view has many additional interaction possibilities compared to the normal sequence view and there are several extra visual aids to help you manipulate the sequences All of this is described in the following 17 2 2 View preferences for cloning view Two additional subgroups are shown in the Side Panel of the cloning view see fi
100. ENCH 14 e On the welcome screen click Next e Read and accept the License agreement and click Next e Choose where you would like to install the application and click Next For a system wide installation you can choose for example opt or usr local If you do not have root privileges you can choose to install in your home directory e Choose where you would like to create symbolic links to the program DO NOT create symbolic links in the same location as the application Symbolic links should be installed in a location which is included in your environment PATH For a system wide installation you can choose for example usr local bin If you do not have root privileges you can create a bin directory in your home directory and install symbolic links there You can also choose not to create symbolic links e Wait for the installation process to complete and click Finish If you choose to create symbolic links in a location which is included in your PATH the program can be executed by running the command clcgenewb Otherwise you start the application by navigating to the location where you choose to install it and running the command clcgenewb 1 2 5 Installation on Linux with an RPM package Navigate to the directory containing the rom package and install it using the rom tool by running a command similar to rpm ivh CLCGeneWorkbench_1_0_2_JRE sh rpm If you are installing from a C
101. ES 183 Other useful resources The Genetic Code at NCBI http www ncbi nlm nih gov Taxonomy Utils wprintgc cgi mode c Codon usage database http www kazusa or jp codon Wikipedia on the genetic code http en wikipedia org wiki Genetic_code Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in it s original form and CLC bio has to be clearly labelled as author and provider of the work You may not use this work for commercial purposes You may not alter transform or build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more about how you may use the contents Chapter 15 Primers Contents 15 1 Primer design an introduction 1 2 eee ee ee 185 15 1 1 General concept gt a core sa 46 44 e Se eR Ra ew OR we es 185 15 2 Setting parameters for primers and probes lt lt lt lt 187 15 21 Primer Parameters ss ig a e a a ws Be x 187 15 3 Graphical display of primer information lt lt 189 15 3 1 Compact information mode 2 189 15 3 2 Detailed information mode 0 es 190 15 4 Output from primer de
102. End gap cost 242 End gap costs cheap end caps 242 free end gaps 242 Enzyme list create 234 modify 235 eps format export 92 Error reports 19 Evolutionary relationship 256 Example data import 22 Expect BLAST search 111 Export bioinformatic data 88 dependent objects 89 folder 88 graphics 91 history 89 list of formats 270 multiple files 88 preferences 78 project 88 External files import and export 90 Extinction coefficient 153 Extract sequences 132 FASTA file format 29 86 270 Feature request 19 Feature table 155 Features see Annotation File system local BLAST database 114 Find open reading frames 168 Fit Width 71 Fixpoints for alignments 244 Floating Side Panel 79 Format of the manual 24 FormatDB 114 Fragments separate on gel 237 Free end gaps 242 fsa file format 86 G C content 122 265 G C restrictions 3 end of primer 188 5 end of primer 189 End length 188 Max G C 188 Gap delete 250 extension cost 242 fraction 247 265 insert 249 open cost 242 gbk file format 86 GC content 187 GCG Alignment file format 29 86 270 GCG Sequence file format 29 86 270 Gel electrophoresis 235 marker 238 view 237 view preferences 237 when finding restriction sites 232 GenBank file format 29 86 270 search 101 265 search sequence in 105 tutorial 31 Gene finding 168 General preferences 77 General Sequence Analyses 136 G
103. Gb Division gt Length Modification Date gt Organism gt Annotation Map Figure 11 5 The initial display of sequence info for the HUMHBB DNA sequence from the Example data 11 2 1 Annotation map The Annotation map displays the various types of annotations that are attached to the sequence Clicking on the name of a type of annotation will list the annotations of this type If there are more annotations of the same kind the blue arrows can be used to move up and down in the annotations of that type In order to use the links you have to open a second view of the sequence double click the sequence in the Navigation Area If you have this view open clicking one of the annotations in the Annotation map will make a selection in the other view corresponding to the annotation see fig 11 6 Annotations cannot be added or modified using the Sequence info For adding and modifying annotations see section 11 1 4 11 3 View as text A sequence can be viewed as text without any layout and text formatting This displays all the information about the sequence in the GenBank file format To view a sequence as text select a sequence in the Navigation Area Show in the Toolbar As text This way it is possible to see background information about e g the authors and the origin of DNA and protein sequences Selections or the entire text of the Sequence Text Viewer can be copied and pasted into other programs Much of the inform
104. Here is a short list of basic concepts of how to use CLC Gene Workbench e All data for use in the CLC Gene Workbench should be stored inside the program in the Navigation Area This means that you have to either import some of your own data or use e g the GenBank search function 8 e The data can be viewed in a number of ways First click the element e g a sequence in the Navigation Area and then click Show to find a proper way to view the data see figure 1 10 for an example CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 21 e When a view is opened there are three basic ways of interacting 1 Using the Side Panel to the right to specify how the data should be displayed these settings are not associated with your data but they can be saved by clicking the icon 35 in the upper right corner of the Side Panel 2 Using right click menus e g to edit a sequence in this case you have to make a selection first using the selection mode IN 3 Using the Zoom 550 20 tools e In the Toolbox you find all the tools for analyzing and working on your data In order to use these tools your data must be stored in a project in the Navigation Area Site Show O as Circular As Text Cloning Editor Ch History Tr Primer Designer gt Sequence Sequence Info Figure 1 10 The different ways of viewing DNA sequences 1 6 2 Quick start When the program opens for the first time the background o
105. INTRODUCTION TO CLC GENE WORKBENCH 17 If you have problems activating the license online CLC Gene Workbench also offers you an opportunity to manually activating your license key Step 3 of the license activation dialog provide a License number and an Activation Key By clicking Copy this information to the clipboard you can open an email editor and paste these two numbers into the mail If you email this content and a short explanation to support clcbio com we will send back a pre activated license key Also in all steps of the license dialog you have an option of resetting the license This will allow you to start over importing another license However information about which licenses were used on the computer is stored externally to prevent unauthorized use of demo licenses 1 4 3 Commercial license Unlike the demo version the commercial version is fully functional offline When you buy a license for CLC Gene Workbench we will provide you with a license key which is activated as described here Start the program and the dialog shown in figure 1 6 will appear Get license Accept agreement Activate license A license is required In order to use this application you will need a valid license key file If you already have a key file containing a valid license you can import it by clicking the import button below If you do not have a license you can request an evaluation license on line by clicking the request button below
106. Importing and viewing trace data Trim sequences Assemble without use of reference sequence Assemble to reference sequence Viewing and edit contigs Molecular cloning Free Protein Gene Combined Advanced molecular cloning Graphical display of in silico cloning Advanced sequence manipulation Appendix B BLAST databases Several databases are available at NCBI which can be selected to narrow down the possible BLAST hits B 1 Peptide sequence databases nr Non redundant GenBank CDS translations PDB SwissProt PIR PRF excluding those in env_nr refseq Protein Sequences from NCBI Reference Sequence project http www ncbi nim nih gov RefSeq swissprot Last major release of the SWISS PROT protein sequence database no incre mental updates pat Proteins from the Patent division of GenBank pdb Sequences derived from the 3 dimensional structure records from the Protein Data Bank http www rcsb org pdb env_nr Non redundant CDS translations from env_nt entries month All new or revised GenBank CDS translations PDB SwissProt PIR PRF released in the last 30 days B 2 Nucleotide sequence databases nr All GenBank EMBL DDBJ PDB sequences but no EST STS GSS or phase O 1 or 2 HTGS sequences No longer non redundant due to computational cost refseq_rna MRNA sequences from NCBI Reference Sequence Project refseq_genomic Genomic sequences from NCBI Reference Sequence P
107. Inconsistency button at the top of the Side Panel or press the Space key to find the first position where there is disagreement between the reads see figure 2 33 In this example the first and the third reads have a T whereas the second line has a C CHAPTER 2 TUTORIALS 46 It pBR322 1850 1900 1950 l l Conflict j primer re BR322 lt Lat 18 Lgt 19 Lgt 20 si m 2 ES pBR322 primers Standard primers for sequence pBR322 Penalty Pair annealing ali Sequence F1 Melting tempera Sequence R1 Melting tempera Open primer s F1 R1 Save primer s F1 R1 Mark primer annotation on sequence 53 61 GCGGTTTTTTCCTG Open fragment Save Fragment Figure 2 30 The options available in the right click menu Here Mark primer annotation on sequence has been chosen resulting in two annotations on the sequence above labeled F1 and R1 marked with a light pink background color The gray color of the residues in the fourth line indicates that this region has been trimmed based on the criteria in figure 2 31 and that this information is not included in the creation of the contig Since the majority of the reads show a T in this position we settle on this in the consensus In order to show that there has been a disagreement in this position type a lower case t see figure 2 34 Clicking the Find Inconsistency button again will find the next inconsiste
108. KO EliTcEWcKMN MADccAEABA REBTMRPWTO 39 O Fixed wrap P68945 MHWTAEEKO EliTcEWcKMN MabccAEaBa REBTMRPWTQ 39 Consensus MVHLTAEEKN AVTGLWGKVN VDEVGGEALG RLLVVYPWTQ e coses m ADe a e e O te _seavencetow MIRETERENO KUTALWGKYN VogobetALe AL en ni pesos REEMSECHES SPBAUMCNPK BRAHCKRUEN SESBCERNED 7 to pesoss REEDSECDEs SPDAMMGNPK MKAHCKKMEN SESECHENED 7 Bog AAA neonne v Sequence layout V Spaces every 10 residues O No wrap V Numbers on sequences Figure 2 10 The resulting alignment To save the alignment drag the tab of the alignment view into the Navigation Area 2 5 Tutorial Create and modify a phylogenetic tree You can make a phylogenetic tree from an existing alignment See how to create an alignment in Tutorial Align protein sequence We use the PO4443_alignment located in Performed Analyses Protein Workbench in the Example data To create a phylogonetic tree right click the P04443_alignment in the Navigation Area Toolbox Alignments and Trees Create Tree 43 A dialog opens where you can confirm your selection of the alignment Moving to the next step in the dialog you can choose between the neighbor joining and the UPGMA algorithms for making trees You also have the option of including a bootstrap analysis of the result Click Finish to start the calculation which can be seen in the Toolbox under the Processes tab and after a shor
109. L and GenBank The following are examples of how to use the syntax based on http www ncbi nlm nih gov collab FT 467 Points to a single residue in the presented sequence 340 565 Points to a continuous range of residues bounded by and including the starting and ending residues lt 345 500 Indicates that the exact lower boundary point of a region is unknown The location begins at some residue previous to the first residue specified which is not necessarily contained in the presented sequence and continues up to and including the ending residue lt 1 888 The region starts before the first sequenced residue and continues up to and including residue 888 1 gt 888 The region starts at the first sequenced residue and continues beyond residue 888 102 110 Indicates that the exact location is unknown but that it is one of the residues between residues 102 and 110 inclusive CHAPTER 11 VIEWING AND EDITING SEQUENCES 126 123 124 Points to a site between residues 123 and 124 join 12 78 134 202 Regions 12 to 78 and 134 to 202 should be joined to form one contiguous sequence complement 34 126 Start at the residue complementary to 126 and finish at the residue complementary to residue 34 the region is on the strand complementary to the presented strand complement join 2691 4571 4918 5163 Joins regions 2691 to 4571 and 4918 to 5163 then complements the joined segments the reg
110. NEKS S VVVVAAAADDEEGGGG Starts MMMM M Basel TITTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAARARAAAARAAGGGGGGGGGGGGGGGG Base2 TT TTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Figure 14 10 The Standard Code for translation Challenge of reverse translation A particular protein follows from the translation of a DNA sequence whereas the reverse translation need not have a specific solution according to the Genetic Code The Genetic Code is degenerate which means that a particular amino acid can be translated into more than one codon Hence there are ambiguities of the reverse translation CHAPTER 14 PROTEIN ANALYSES 182 Second base in codon U Cc A G Phe Ser Tyr Cys U U Phe Ser Tyr Cys Cc Leu Ser STOP STOP JA Leu Ser STOP Trp G s Leu Pro His Arg u 2 Bie Leu Pro His Arg c a O Leu Pro Gln Arg A o Leu Pro Gn Arg G 2 a Ile Thr jAsn Ser U 5 FE A Ile Thr Asn Ser c 9 Y Ile Thr Lys Arg A 5 i Met Thr Lys Arg G 3 Val Ala Asp Gly U G Val Ala Asp Gly c Val Ala Glu Gly A Val Ala Glu Gly G Figure 14 11 The standard genetic code showing amino acids for all 64 possible codons Solving the ambiguities of reverse translation In order to solve these ambiguities of reverse translation you can define how to prioritize the codon selection
111. Navigation Area Sequence Representation select format This will only affect sequence elements and the display of other types of elements e g alignments trees and external files will be not be changed If a sequence does not have this information there will be no text next to the sequence icon Rename element Renaming a project folder piece of data etc can be done in three different ways right click the element Rename or select the element Edit in the Menu Bar Rename or select the element F2 When the editing of the name has finished press enter or select another element in the Navigation Area If you want to discard the changes instead press the Esc key 3 1 6 Delete elements Deleting a project folder piece of data etc can be done in two ways right click the element Delete 4 or select the element press Delete key This will cause the element to be moved to a Recycle Bin where it is kept as a precaution Restore Deleted Elements The elements in the Recycle Bin can be restored and saved in the Navigation Area again This is done by Edit in the Menu Bar Restore Deleted Elements ff This opens the dialog shown in fig 3 3 The dialog shows a list of all the deleted elements Select the elements you want to restore and click next This opens the dialog shown in fig 3 4 Choose where to restore the deleted elements Click Finish Notice Only files which were saved in the Navigation Area and the
112. P68945 fj Extra Performed analyses README Figure 14 8 Choosing a protein sequence for reverse translation Y Reverse Translate 1 Select protein sequences cc AAA 2 Set parameters Translation parameters Select codon randomly O Select only the most Frequently used codon OO Select codon based on frequency distribution Map annotations to reverse translated sequence Codon Frequency tables 0 6 Eee DEDNDDEDE 0 J 4 _ Previous J mex Finish X Cancel Figure 14 9 Choosing parameters for the reverse translation following two options e Uniform distribution This parameter option will randomly back translate an amino acid to a codon without using the translation tables Every time you perform the analysis you will get a different result e Distribution according to frequency This option is a mix of the other two options The selected translation table is used to attach weights to each codon based on its frequency The codons are assigned randomly with a probability given by the weights A more frequent codon has a higher probability of being selected Every time you perform the analysis you will get a different result This option yields a result that is closer to the translation behavior of the organism assuming you chose an appropriate codon frequency table e Map annotations to reverse translated sequence If this checkbox is checked then all ann
113. PTER 11 VIEWING AND EDITING SEQUENCES 133 O 4F 134224 AF134224 171 bp Figure 11 10 A molecule shown in a circular view In the Sequence Layout preferences only the following options are available in the circular view Numbers on plus strand Numbers on sequence and Sequence label The circular view can display restriction sites using the Restriction Sites preference group described below You cannot zoom in to see the residues in the circular molecule If you wish to see these details split the view with a linear view of the sequence described below 11 6 1 Show restriction sites for circular DNA These preferences allow you to display restriction sites on the sequence There is a list of enzymes which are represented by different colors By selecting or deselecting the enzymes in the list you can specify which enzymes restriction sites should be displayed see figure 17 4 160 v Restriction sites l CACACACA CGACCACACTGCATCTGCAGAACCG Show GTGTGTGTCAGCTIGGTGTGACGTA CGTCTTGGC Done MA sti ceca E Y salt etceac Figure 11 11 Showing restriction sites of two restriction enzymes The color of the flag of the restriction site can be changed by clicking the colored box next to the enzyme s name The list of restriction enzymes contains per default ten of the most popular enzymes but you can easily modify this list and add more enzymes You have four ways of modifying the list e E
114. Perform boots Replicates 100 0 4 _ Previous J mex Y Finish X Cancel Figure 19 2 Adjusting parameters e Algorithms The UPGMA method assumes that evolution has occured at a constant rate in the different lineages This means that a root of the tree is also estimated The neighbor joining method builds a tree where the evolutionary rates are free to differ in different lineages CLC Gene Workbench 2 0 always draws trees with roots for practical reasons but with the neighbor joining method no particular biological hypothesis is postulated by the placement of the root Figure 19 3 shows the difference between the two methods e To evaluate the reliability of the inferred trees CLC Gene Workbench 2 0 allows the option of doing a bootstrap analysis A bootstrap value will be attached to each branch and this value is a measure of the confidence in this branch The number of replicates in the bootstrap analysis can be adjusted in the wizard The default value is 100 For a more detailed explanation see Bioinformatics explained in section 19 2 CHAPTER 19 PHYLOGENETIC TREES 258 aft Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Homo sapiens human Homo sapiens human Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse 44 Homo sapiens human Peromyscus maniculatus deer mouse ad Peromyscus manicu
115. Performed analyses g ll a eros ares Jr ena Figure 2 8 The alignment dialog displaying the 8 chosen protein sequences It is possible to add and remove sequences from Selected Elements list When the relevant proteins are selected there are two options Click Next to adjust parameters for the alignment Clicking Next opens the dialog shown in fig 2 9 Leave the parameters at their default settings An explanation of the parameters can be found in the program s Help function tk or in the user manual on http www clcbio com download Click Finish to start the alignment process which is shown in the Toolbox under the Processes tab When the program is finished calculating it displays the alignment see fig 2 10 Notice The new alignment is not saved automatically The text on the tab is bold and italic to illustrate this CHAPTER 2 TUTORIALS 34 Y Create Alignment 1 Select sequences or alignments of same type 2 Set parameters Gap settings Gap open cost 10 0 Gap extension cost 1 0 End gap cost As any other w V Fast alignment O Jla Previous pret _ Y rn _ cancel Figure 2 9 The alignment dialog displaying the available parameters which can be adjusted HEE P68046_alignment 0 A Almenara x Pesos6 MABTABERA Ara BWcKEN Pr e Posos3 MHiETCEEKA P68225 P68873 P68228 M d P68231 MMHESG au WK MDENccEAEc PWT Auta treo Peso63 MHWTAEE
116. Protein 1 Select nucleotide Mess ences S Projects Selected Elements SLL Example data DC PERH3BC Nucleotide Sequences x 20 PERH2BD 20 HUMDINUC sequence list 7 Assembly H Cloning project Primer design W E Protein E Extra W E Performed analyses E README CLC bio Home cs Figure 13 4 Choosing sequences for translation If a Sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree Click Next to set reading frames select if you want to translate all coding regions of the sequence and choose translation tables Clicking Next generates the dialog seen in figure 13 5 The translation tables in CLC Gene Workbench are updated regularly from NCBI Therefore the tables are not available in this printable version of the user manual Instead the tables are included in the Help menu in the Menu Bar under Background Information Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish The newly created protein is shown but is not saved automatically There are also new views of proteins for every CDS or ORF annotation if you have selected to translate all coding regions To save a protein sequence drag it into the Navigation Area or press Ctrl S S on Mac to activate a save dialog CHAPTER 1
117. R 2 TUTORIALS 38 Select HUMHBB in the Navigation Area Show in Menu Bar Sequence Info FE This opens a new view shown in figure 2 16 906 HUMHBB gt Description Comments gt KeyWords Gb Division Length Modification Date gt Organism gt Annotation Map Figure 2 16 The initial view of sequence info of HUMHBB The sequence is originally downloaded from GenBank and it is the information from the GenBank file which is shown as a list of headings Click the heading Modification Date to see when the sequence was modified in GenBank At the bottom there is an Annotation Map providing an overview of the annotations on the sequence The annotations are divided into types We are interested in the coding sequences of HUMHBB Click Annotation Map Click CDS The seven coding sequences are displayed with the corresponding positions in GenBank syntax In order to make full use of the Annotation Map open a normal view of the HUMHBB sequence below the Sequence Info Select the HUMHBB in the Navigation Area Drag it to the bottom of the View Area until a gray shadow appears Now clicking a coding sequences in the Annotation Map will make a selection representing the coding sequence in the view below You can see that the selection matches the CDS annotation the yellow boxes in figure 2 17 2 8 Tutorial BLAST search This tutorial shows you how to perform a BLAST search using CLC Gene W
118. Re dsd 271 Bibliography 272 V Index 275 Part Introduction Chapter 1 Introduction to CLC Gene Workbench Contents 1 1 Contact information s x ror se a ee ae AR a Ee ee a 11 1 2 Download and installation 1 0 2 eee ee ee 11 124 Program download 44 rc ic a ae ae 11 1 2 2 Installation on Microsoft Windows 0 00 eee eee eee 12 123 Installation on Mat OSX ooo ic ek dal hk ha ee A a A e 13 1 2 4 Installation on Linux with aninstaller o ee 13 1 2 5 Installation on Linux with an RPM package o 14 1 3 System requirements 2 2 14 1 4 Licenses sooi ee a Be ee a 15 1 4 1 Demo license description 2 00 15 1 4 2 Getting and activating the demo license 15 1 4 3 Commercial license s s a ros sorsa ek o es 17 1 4 4 Upgrading from a demo license to a commercial license 18 1 5 About CLC Workbenches 00 2 eee eee ee 18 1 5 1 New program feature request 0 19 1 5 2 Report program errors i lt a s scra a e a we ae 19 1 5 3 Free vs commercial workbenches 19 1 6 When the program is installed Getting started 20 1 6 1 Basic concepts of using CLC Workbenches 20 10 2 QUICK SAR 2 260 hace ects ce A de dock Sok AR a amp eS eee ae a 21 1 6 3 Import Ofexample data ic e so
119. Sets a background color of the residues using a gradient in the same way as described above Graph The G C content level is displayed on a graph x Height Specifies the height of the graph x Type The graph can be displayed as Line plot Bar plot or as a Color bar x Color box For Line and Bar plots the color of the plot can be set by clicking the color box For Colors the color box is replaced by a gradient color box as described under Foreground color Hydrophobicity info These preferences only apply to proteins and are described in section 14 2 2 Search The Search group is not a preferences group but can be used for searching the sequence Clicking the search button will search for the first occurrence of the search string Clicking the search button again will find the next occurrence and so on If the search string is found the corresponding part of the sequence will be selected e Search term Enter the text to search for The search function does not discriminate between lower and upper case characters e Sequence search Search the nucleotides or amino acids For nucleotides all the standard IUPAC codes can be used e g RT will find both GT and AT RT will also find e g AN The IUPAC codes are available from the Help menu under Background Information For amino acids the single letter abbreviations should be used for searching Accordingly N for nucleotides and X for proteins can be used as a wildcard character
120. TC CTGAGAACTT CAGGGTGAGT CTATGGGACC D AY738615 AY738615 Figure 2 5 The resulting two views which are split horizontally the tab of the view down next to the tab of the bottom view 2 3 Tutorial GenBank search and download The CLC Gene Workbench allows you to search the NCBI GenBank database directly from the program giving you the opportunity to both open view analyze and save the search results without using any other applications To conduct a search in NCBI GenBank from CLC Gene Workbench you must be connected to the Internet This tutorial shows how to find a complete human hemoglobin DNA sequence in a situation where you do not know the accession number of the sequence To start the search Search Search NCBI Entrez g This opens the search view We are searching for a DNA sequence hence Nucleotide Now we are going to Adjust Parameters for the search By clicking More Choices you activate an additional set of fields where you can enter search criteria Each search criterion consists of a drop down menu and a text field In the drop down menu you choose which part of the NCBI database to search and in the text field you enter what to search for Click More Choices until three search criteria are available choose Organism in the first drop down menu write human in the adjoining text field choose All Fields in the second drop down menu write hemoglobin in the adjoining text field choose A
121. TUTORIALS 27 Y CLC Gene Workbench 2 0 Default File Edit Search View Toolbox Workspace Help 26 mt 1 oe CS 2 Cut Copy Paste Delete Workspace Search Pan EARM Zoom In Zoom g Q E Example data gt Alignments and Trees KA General Sequence Analyses HRA Nucleotide Analyses a a a Protein Analyses VUIC k sta t a Primers and Probes gt Sa Assembly ES 2 Cloning and Restriction Sites f BLAST Search a Database Search Processes Toolbox h E Idle Figure 2 1 The user interface as it looks when you start the program for the first time Windows version of CLC Gene Workbench The interface is similar for Mac and Linux Right click the Test project in the Navigation Area New Folder 3 or Ctrl F 38 F on Mac Name the folder Subfolder and press Enter 2 1 2 Import data Next we want to import a sequence called HUMDINUC fsa FASTA format from our own Desktop into the new Subfolder This file is chosen for demonstration purposes only you may have another file on your desktop which you can use to follow this tutorial You can import all kinds of files In order to import the HUMDINUC fsa file Import 5 in the Toolbar select FASTA fsa fasta in the Files of type drop down menu navigate to HUMDINUC fsa on the desktop Select For files of FASTA or PIR format you are asked to state which type of sequence you are importing This will
122. This opens the dialog shown in figure 12 1 Y Create Dot Plot 1 Select Sequences of Same Select Sequences or Same Type Projects Selected Elements LL Example data e Q6WNZO E Nucleotide Ae Q6WNZ1 Protein aE 3D structures E Sequences f sequence is VII Qewnzo QeWNZ1 As Q6WNZ2 B E Extra 8 9 Performed analyses E README CLC bio Home gt Figure 12 1 Selecting sequences for the dot plot If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove elements from the Project Tree Click Next to adjust dot plot parameters Clicking Next opens the dialog shown in figure 12 2 Notice that calculating dot plots take up a considerable amount of memory in the computer Therefore you see a warning if the sum of the number of nucleotides amino acids in the sequences is higher than 8000 If you insist on calculating a dot plot with more residues the Workbench may shut down allowing you to save your work first To avoid the Workbench shutting down you may choose to adjust the memory allocation to CLC Gene Workbench See section 1 8 CHAPTER 12 GENERAL SEQUENCE ANALYSES 138 Adjust dot plot parameters There are two parameters for calculating the dot plot e Distance correction only valid for protein sequences In order to treat evolutionary transitions of amino acids a di
123. Title The action that the user performed 95 CHAPTER 7 HISTORY 96 O nucleotide al amp Moved selection 2 positions left Fri Jun 30 22 24 40 CEST 2006 User CLC user Parameters Sequences PERH2BD Region 138 144 Comments No Comment Edit Create Alignment Wed Jun 21 15 38 55 CEST 2006 User CLC user Parameters Gap open cost 10 0 Gap extension cost 1 0 End gap cost As any other Fast alignment true Comments No Comment Edit Origins from XC PERHZBD history XE PERH3EC history Figure 7 1 An element s history Date and time Date and time for the operation The date and time are displayed according to your locale settings see section 4 1 User The user who performed the operation If you import some data created by another person in a CLC Workbench that persons name will be shown Parameters Details about the action performed This could be the parameters that was chosen for an analysis Origins from This information is usually shown at the bottom of an element s history Here you can see which elements the current element origins from If you have e g created an alignment of three sequences the three sequences are shown here Clicking the element selects it in the Navigation Area and clicking the history link opens the element s own history 7 1 1 Sharing data with history The history of an element is attached to that element which means that exporting an ele
124. a database directory Import confirm the information Notice The default installation of the VectorNTl program for the database home is e C VNTI Database for Windows machines and CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 88 e Library Application Support VNTI Database for Mac OS X for Panther Therefore the CLC Gene Workbench 2 0 will check if there is a default installation and will ask whether you want to use the default database directory or another directory Notice Make sure that the Vector NTI database directory default or backup contains folders like ProData and MolData These folders are necessary when we import the data into CLC Gene Workbench 2 0 In order to import all DNA RNA and protein sequences if a default database directory is installed select File in the Menu Bar Import VectorNTI Data select Yes if you want to import the default database confirm the information or select File in the Menu Bar Import VectorNTI Data select No to choose a database select a database directory Import confirm the information After the import there is a new Project called Vector NTI Data in the Navigation Area In Vector NTI Data you can see two folders DNA RNA containing the DNA and RNA sequences and Protein containing all protein sequences See figure 6 2 The project folders and all sequences are automatically saved LL Vector NTI Data Proteins a Nucleotide 200 ADCY 20 Adeno2 YC ADRAI
125. ables a given viewer to decide how to draw the line no matter what the zoomfactor is thereby always giving a correct CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 93 Format Suffix Type Portable Network Graphics png bitmap JPEG Jpg bitmap Tagged Image File tif bitmap PostScript ps vector graphics Encapsulated PostScript eps vector graphics Portable Document Format pdf vector graphics Scalable Vector Graphics SVg vector graphics image This format is good for e g graphs and reports but less usable for e g dotplots Graphics files can also be imported into the Navigation Area However no kinds of graphics files can be displayed in CLC Gene Workbench 2 0 See section 6 2 1 for more about importing external files into CLC Gene Workbench 2 0 6 3 1 Exporting protein reports Protein reports cannot be exported in the same way as other data Instead they can be exported from the Navigation Area Click the report in the Navigation Area Export ES in the Toolbar select pdf When the report is exported the file can be opened with Adobe Reader Opening and printing in Adobe Reader is also the only way to print the report 6 4 Copy paste view output The content of tables e g in reports folder lists and sequence lists can be copy pasted into different programs where it can be edited CLC Gene Workbench 2 0 pastes the data in tabulator separated format which is useful if you use programs like Microsoft Word and E
126. age setup e A Print preview window These three kinds of dialogs are described in the two following sections 5 1 Selecting which part of the view to print Views that are printed exactly like they look on the screen have an option for selecting which part of the view to print see figure 5 1 82 CHAPTER 5 PRINTING 83 Figure 5 1 When printing graphics you get the options of printing the visible area or printing the whole view Printing the whole view is useful if you have zoomed in on an area of the view and you want to print the whole view also the part of e g a sequence which is not visible On the other hand if you want to print some details of an area of the view you can use the zoom and navigate functions first and then print the visible area This will result in a print of only some part of the sequence 5 2 Page setup No matter whether you have chosen to print the visible area or the whole view you can adjust page setup of the print An example of this can be seen in figure 5 2 Y Page Setup Page Header Footer Orientation E Portrait O Landscape Paper Size A4 Fitto pages Horizontal pages Vertical pages wf ok 3 Cancel Hep Figure 5 2 In this dialog the default settings Portrait and A4 apply to print of an alignment By checking Fit to pages it is possible to adjust Horizontal pages to 2 This is done allow a long sequence to stretch the width of t
127. ail please write an e mail to support clcbio com including your postal address 1 2 1 Program download The program is available for download on http www clcbio com download Before you download the program you are asked to fill in the Download dialog In the dialog you must choose e Which operating system you use e Whether you want to include Java or not this is necessary if you haven t already installed Java e Whether you would like to receive information about future releases Depending on your operating system and your Internet browser you are taken through some download options When the download of the installer an application which facilitates the installation of the program is complete follow the platform specific instructions below to complete the installation procedure CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 12 Download El CLC Gene Workbench 1 0 2 Email _ _ Name Department 7 Company f Institution AA as O Mac OS X 10 3 or later including Intel based Macs men 31MB disc image dmg My Windows 2000 or Windows XP M 35MB installer exe C Include Java approximately 15MB extra Linux RedHat SuSE installer 28MB installer sh gt O Linux RedHat SuSE RPM 28MB rpm package rpm Include Java approximately 15MB extra Email notifications Mark this Field if you would like to know about new software releases and other relevant bioinformatics inform
128. al files lt ss cei oo es 90 6 2 2 Exporextermalifiles o sorak Sek seek ee GE ee eee a es 90 6 2 3 Technical detalls a a a6 3 x so aa Re ae a iea 91 6 3 Export graphics to files lt lt 91 6 3 1 Exporting protein reports es 93 6 4 Copy paste view output 2 93 CLC Gene Workbench 2 0 handels a large number of different data formats All data stored in the Workbench is available in the Navigation Area of the program The data of the Navigation Area can be divided into two groups The data is either one of the different bioinformatic data formats or it can be an external file Bioinformatic data formats are those formats which the program can work with e g sequences alignments and phylogenetic trees External files are files or links which are stored in CLC Gene Workbench 2 0 but are opened by other applications e g pdf files Microsoft Word files Open Office spreadsheet files or it could be links to programs and webpages etc Furthermore this chapter deals with the export of graphics 6 1 Bioinformatic data formats The different bioinformatic data formats are imported in the same way therefore the following description of data import is an example which illustrates the general steps to be followed regardless of which format you are handling 85 CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 86 6 1 1 Import
129. alog shown in figure 10 8 Select a Click Next CHAPTER 10 BLAST SEARCH 114 9 BLAST Against Local Database 1 Select sequences of the ect Sel same type Projects Selected Elements S L Example data As CAA26204 E E Nucleotide a Protein fj Extra a f Performed analyses E README CLC bio Home sil blast database iones Figure 10 6 Choose one or more sequences to conduct a Local BLAST search 9 BLAST Against Local Database 1 Select sequences of the same type 2 Set program parameters Choose Program and Database ning ald blastp Protein sequence against Protein MI blast database x Database Select Database Genetic code LOJLSJ reo pret Y Figure 10 7 Choose a BLAST program and a local database to conduct BLAST search This opens the dialog seen in figure 10 9 See section 10 1 for information about these limitations Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish 10 3 Create Local BLAST Database In CLC Gene Workbench you can create a local database which you can use for local BLAST Both DNA RNA and protein sequences can be used It is not necessary to import the sequences into CLC Gene Workbench before creating the database The local database can be created from sequences which are stored in the Navigation Area or the sequences can be browsed from the computer s file system In the latter case the
130. an navigate and zoom a view of sequence or an alignment using the keyboard e Navigate the view using the four arrow keys This is equivalent to scrolling with the mouse using the scroll bars e Use the and keys to zoom in and out This is equivalent to using the zoom modes in the toolbar Note that you have to click once inside the view with the mouse first in order to use this functionality There are many other shortcuts in CLC Gene Workbenchwhich may save you a lot of time when performing repetitive tasks See section 3 6 for a list of available shortcuts Part Il Basic Program Functionalities 56 Chapter 3 User Interface Contents 3 1 Navigation Area c i 666 ei eee EE eee eee 58 SL Ll Data SUCTUS 21 2 cw whe a a we ee KR eg 58 3 1 2 Create new projects and folders oo aa ae lt p ES ee ew 59 3 1 3 Multiselecting elements 0 000 eee ee 60 3 1 4 Moving and copying elements 0 000s eee wees 60 3 1 5 Change element names 2 0 eee ee ee tes 61 3 1 6 Delete elements ts 62 3 1 7 Show folder elements in View et 63 3 1 8 Sequence properties o 64 3 2 VIGW AlCa aci a A A AAA AA 64 324 L Open VIEW vico a a a ees 64 312 2 ClOSE VIEWS i a ste ue A ER RR ee 65 3 2 3 Save Changes Mia ViOW s osoro ao eae a ee amp 66 B24 WMG REJO cba ee a ee Me Se he eee Ete te A Geet ok ed 66 3 2 5 A
131. ancel 9 Help Figure 15 13 Calculation dialog shown when designing alignment based PCR primers 15 9 3 Alignment based TaqMan probe design CLC Gene Workbench allows the user to design solutions for TaqMan quantitative PCR which consist of four oligos a general primer pair which will amplify all sequences in the alignment a specific TaqMan probe which will match the group of included sequences but not match the excluded sequences and a specific TaqMan probe which will match the group of excluded sequences but not match the included sequences As above the selection boxes are used to indicate the status of a sequence if the box is checked the sequence belongs to the included sequences if not it belongs to the excluded sequences We use the terms included and excluded here to be consistent with the section above although a probe solution is presented for both groups In TaqMan mode primers are not allowed degeneracy or mismatches to any template sequence in the alignment variation is only allowed required in the TaqMan probes Pushing the Calculate button will cause the dialog shown in figure 15 14 to appear The top part of this dialog is identical to the Standard PCR dialog for designing primer pairs described above The central part of the dialog contains parameters to define the specificity of TaqMan probes Two parameters can be set e Minimum number of mismatches the minimum total number of mismatches that must exist betwee
132. and how you can modify it 17 4 1 Create enzyme list CLC Gene Workbench 2 0 uses enzymes from the REBASE restriction enzyme database at http rebase neb com To start creating a sequence list right click in the Navigation Area New Enzyme list E3 This opens the dialog shown in figure 17 17 Step 1 includes two tables The top table is a list of all the enzymes available in the REBASE database Different information is available for the enzymes and by clicking the column headings the list can be sorted The sequence list is created by adding enzymes to the bottom table To create sequence list Select sequences from top table hold ctrl 36 on Mac click down arrow CHAPTER 17 CLONING AND CUTTING 235 Figure 17 17 Choosing enzymes for the new enzyme list When the desired enzymes have been chosen click Next Choose where to save your enzyme list and name the sequence list Click Finish to see the enzyme list In the View preferences it is possible to choose which column to display 17 4 2 Modify enzyme list If you want to make changes to an existing enzyme list select an enzyme list Toolbox in the Menu Bar Cloning and Restriction Sites C3 Modify Enzyme List 33 Select the Enzyme list and click Next This opens the dialog shown in figure 17 18 5 Modify enzyme list 1 Select enzyme list to modfy 2 Edt enzyme lst gt Recognize
133. and that the TaqMan Probe region is located upstream of the Reverse primer region In TaqMan mode the Inner melting temperature menu in the primer parameters panel is activated allowing the user to set a separate melting temperature interval for the TaqMan probe After exploring the available primers See section 15 3 and setting the desired parameter values in the Primer Parameters preference group the calculate button will activate the primer design algorithm After pressing the calculate button a dialogue will appear See 15 10 which is identical to the Nested PCR dialogue described above see section 15 6 g Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Primer combination parameters OO nn Max percentage point difference in G C content Max difference in melting temperatures within a primer pair Max pair annealing score Minimum difference in melting temperature Inner Outer on lt gt lt gt Desired difference in melting temperature Inner Outer Mispriming parameters Use mispriming as exclusion criteria Figure 15 10 Calculation dialog In this dialogue the options to set a minimum and a desired melt
134. anel to the right of the view see figure 15 3 There are two different ways to display the information relating to a single primer the detailed and the compact view Both are shown below the primer regions selected on the sequence 15 3 1 Compact information mode This mode offers a condensed overview of all the primers that are available in the selected region When a region is chosen primer information will appear in lines beneath it see figure 15 4 TO AY738615 Primer parameters y Primer information Show O Compact O Detailed 0 to 37 Sequence layout Annotation layout Figure 15 4 Compact information mode The number of information lines reflects the chosen length interval for primers and probes One line is shown for every possible primer length if the length interval is widened more lines will appear At each potential primer starting position a circle is shown which indicates whether the primer fulfills the requirements set in the primer parameters preference group A green primer indicates a primer which fulfils all criteria and a red primer indicates a primer which fails to meet one or more of the set criteria For more detailed information place the mouse cursor over the CHAPTER 15 PRIMERS 190 circle representing the primer of interest A tool tip will then appear on screen displaying detailed information about the primer in relation to the set criteria To locate the
135. aphics view e BLAST Layout You can choose whether to Gather sequences at top which means that vertical gaps between sequences are eliminated to assist comparison between the query sequence and the hit sequences e BLAST info In this View preference group you can choose whether to color hit sequences and you can adjust the coloring The remaining View preferences for BLAST Graphics are the same as those of alignments See section 18 2 Some of the information available in the tooltips is e Name of sequence Here is shown some additional information of the sequence which was found This line corresponds to the description line in GenBank if the search was conducted on the nr database e Length of sequence This shows the entire length of the found sequence e Score This shows the bit score of the local alignment generated through the BLAST search e Expect Also known as the E value A low value indicates a homologous sequence Higher E values indicate that BLAST found a less homologous sequence e Identities This number shows the number of identical residues or nucleotides in the obtained alignment e Gaps This number shows whether the alignment has gaps or not CHAPTER 10 BLAST SEARCH 112 e Strand This is only valid for nucleotide sequences and show the direction of the aligned strands Minus indicate a complementary strand e Query This is the sequence or part of the sequence which you have used for the BLAST search e
136. arding this and the minimum difference option mentioned above please note that to ensure flexibility there is no directionality indicated when setting parameters for melting temperature differences between probes and primers i e it is not specified whether the probes should have a lower or higher Tm Instead this is determined by the allowed temperature intervals for inner and outer oligos that are set in the primer parameters preference group in the side panel If a higher Tm of probes is required choose a Tm interval for probes which has higher values than the interval for outer primers The output of the design process is a table of solution sets Each solution set contains the following a set of primers which are general to all sequences in the alignment a TaqMan probe which is specific to the set of included sequences Sequences where selection boxes are checked and a TaqMan probe which is specific to the set of excluded sequences marked by Otherwise the table is similar to that described above for TaqMan probe prediction on single sequences 15 10 Analyze primer properties CLC Gene Workbench 2 0 can calculate and display the properties of predefined primers and probes select a primer sequence primers are represented as DNA sequences in the Navigation Area Toolbox in the Menu Bar Primers and Probes Analyze Primer Properties Ci If a sequence was selected before choosing the Toolbox action this sequence is now
137. ation Figure 1 1 Download dialog 1 2 2 Installation on Microsoft Windows Starting the installation process is done in one of the following ways Ifyou have downloaded an installer Locate the downloaded installer and double click the icon The default location for downloaded files is your desktop If you are installing from a CD Insert the CD into your CD ROM drive Choose the Install CLC Gene Workbench from the menu displayed If you already have Java installed on your computer you can choose Install CLC Gene Workbench without Java Installing the program is done in the following steps you must be connected to the Internet throughout the installation process e On the welcome screen click Next e Read and accept the License agreement and click Next e Choose where you would like to install the application and click Next e Choose a name for the Start Menu folder used to launch CLC Gene Workbench and click Next CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 13 e Choose where you would like to create shortcuts for launching CLC Gene Workbench and click Next e Wait for the installation process to complete choose whether you would like to launch CLC Gene Workbench right away and click Finish When the installation is complete the program can be launched from the Start Menu or from one of the shortcuts you choose to create 1 2 3 Installation on Mac OS X Starting the installation process is done in one of
138. ation is also displayed in the Sequence info where it is easier to get an overview see section 11 2 CHAPTER 11 VIEWING AND EDITING SEQUENCES 129 HE HUMHBB Annotatio Name Position HBB thalassemia ljoin 62187 i join 19541 19 join 34531 join 39467 join 45710 join 54790 join 62187 Conflict Conflict 37486 Exon Exon 1 lt 45710 45300 Old sequence Exon Exon 1 lt 62187 62278 Exon Exon 2 62390 lt 62408 Exon Exon 1 34478 34622 Exon Exon 1 39414 39558 Exon Exon 3 46997 lt 47124 Repeatregion Exon Exon 1 54740 54881 Exon Exon 1 62137 62278 y Precursor RNA 19500 20000 20500 21000 l l l I HUMH BB Figure 11 6 Clicking a sequence map annotation in the sequence information view selects the annotation on the normal sequence view 11 4 Creating a new sequence A sequence can either be imported downloaded from an online database or created in the CLC Gene Workbench 2 0 This section explains how to create a new sequence New 8 in the toolbar 9 Create Sequence 1 Enter Sequence Data AAA Name Globin Common name Human Species Homo sapiens Type SIE Dna 29 Orna Agp O Protein Circular Description Globin sequence Keywords Comments Sequence required 1 TCTAATCT 8 CCCTCTCAACCCTACAGTACCCATTTGGTATATTAAA e Figure 11 7 Crea
139. be working with DNA sequence AY738615 Double click the sequence in the Navigation Area to open it The sequence is displayed with annotations above it To provide a better view of the sequence hide the Side Panel This is done by clicking the red X EJ at the top right corner of the Side Panel in the right side of the View Area See figure 2 3 As default CLC Gene Workbench displays a sequence with annotations colored arrows on the sequence and zoomed to see the residues In this tutorial we want to have an overview of the whole sequence Hence click Zoom Out in the Toolbar click the sequence until you can see the whole sequence CHAPTER 2 TUTORIALS 30 g CLC Gene Workbench 2 0 Default File Edit Search View Toolbox Workspace Help 2S A aie a Se Gs eile Bl ee QE as Export Graphics Copy Workspace Search Fit Width 100 Pan EXA Zoom In Zoom 4 AY738615 e LL Example data su g B 63 Nucleotide HBD HBB n e SS g B b Sequences v Sequence layout 20 NM_000044 O Spaces every 10 residues 206 AY738615 AY738615 CCTTTAGTGATGGCCTGG 206 HUMDINUC Al Onowen 306 PERH2BD D PERH3BC Auto wrap i sequence list i lt j Cloning project AY738615 CTCACCTGGACAACCTCA Primer design Mi 0 08 PO POET YEVEY C Double stranded Alignments and Trees V Numbers on sequences LA General Sequence Analyses m Nucleotide Analyses Relative to 1 GA 7 AY738615 AGGGCACTTTTTCTCAGC e Protei
140. ble in your system and need to work with very large data objects you can manually change the maximum amount of memory available to the program Doing so is a somewhat complicated unsupported procedure and may cause the program to fail if done incorrectly Depending on your operating system you may have to repeat these changes if you update CLC Gene Workbench 2 0 to a newer version CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 23 1 8 1 Microsoft Windows e Locate the CLC Gene Workbench 2 0 directory inside your Program Files directory and open it e Create a new empty text file called clewb vmoptions make sure the filename does not end with txt e Add a single line to the file with a syntax similar to Xmx512m It is very important that the line looks exactly like the one in the example above and that you only change the value of the number 512 in the example For the best performance you should not choose a number greater than the amount in megabytes of physical memory available on your system 1 8 2 Mac OS X e Locate the CLC Free Workbench program file in your Applications folder e Right click control click the file and choose Show Package Contents from the pop up menu e Open the file called Info plist located inside the Contents folder using the Property List Editor application or a text editor like TextEdit e Edit the Root Java VMOptions property and set the maximum amount of memory to a desired va
141. ble primers see 15 3 and setting the desired parameter values in the Primer Parameters preference group the calculate button will activate the primer design algorithm If only a single region is defined only single primers will be suggested by the algorithm After pressing the calculate button a dialogue will appear see figure 15 7 g Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Mispriming parameters Use mispriming as exclusion criteria of Calculate p Help Figure 15 7 Calculation dialog for PCR primers The top part of this dialog shows the parameter settings chosen in the Primer parameters preference group which will be used by the design algorithm CHAPTER 15 PRIMERS 193 The lower part contains a menu where the user can choose to include mispriming as a criteria in the design process If this option is selected the algorithm will search for competing binding sites of the primer within the sequence The adjustable parameters for the search are e Exact match Choose only to consider exact matches of the primer i e all positions must base pair with the template for mispriming to occur e Minimum number of base pairs
142. boa A Boe ae ee a ed ae ee 16 6 View and edit contigs 17 Cloning and cutting 17 1 Molecular cloning an introduction 2 02 5 3 46 ee ee fee eee ee ee ee 17 2 Graphical display of in silico Cloning s os a ede be ee ee we a a 17 3 Restriction site analysis 17 4 Restriction enzyme lists 171 171 174 179 184 185 187 189 191 192 195 197 199 200 204 205 207 208 208 209 212 213 216 216 CONTENTS 7 17 5 Gel electrophoresis c s aa don ioia a ee a 235 18 Sequence alignment 240 18 1Create an alignment 24 286 ee ra a RR we a a 241 18 2 VEWANE AMENE o i so dock a ct aoe Boh weeds ont Aa I eee Boe we oa a eto I oe Be G 246 18 3 Edit alignmentS ia cada merap ta be de eee ee be kida bee ee eS 249 181 JOUNAUIBAINIGMUS Sa sce et be er eas See Se eee ee eee dE eS 251 18 5 Bioinformatics explained Multiple alignments 22085 253 19 Phylogenetic trees 256 19 1 ISE PAYIOBENEUC TESS 2 ia ete ok a eh A ee ia a ee ae 256 19 2 Bioinformatics explained phylogenetics 00000 259 IV Appendix 264 A Comparison of workbenches 265 B BLAST databases 268 B 1 Peptide sequence databases acs ca scs Ma woa ae Rd IR a a A a a 268 B 2 Nucleotide sequence databases 000 e ees 268 C Formats for import and export 270 C 1 List of bioinformatic data formats a a aoao sos saoo 270 C2 LISt or graphics data TOMAS 00 eos a duos ee a se da
143. by step procedure First you select elements for analyses and then there are a number of steps where you can specify parameters some of the analyses have no parameters e g when translating DNA to RNA The final step concerns the handling of the results of the analysis and it is almost identical for all the analyses so we explain it in this section in general In this step shown in figure 8 1 you have two options e Open This will open the result of the analysis in a view This is the default setting e Save This means that the result will not be opened but saved to a folder in the Navigation Area If you select this option click Next and you will see one more step where you can specify where to save the results see figure 8 2 In this step you have to select a folder You also have the option of creating a new folder in this step 8 1 1 When the analysis does not create new elements When an analysis does not create new elements as e g Find Open Reading Frames which adds annotations to the sequences the options for saving are different see figure 8 3 97 CHAPTER 8 HANDLING OF RESULTS 98 9 Convert DNA to RNA 1 Select DNA sequences RER 2 Result handling Output options Figure 8 1 The last step of the analyses exemplified by Translate DNA to RNA 9 Convert DNA to RNA 1 Select DNA sequences Beene 2 Result handling LL Example data 3 Save in Folder B 63 Nucleotide 5 E Sequences 20
144. ce 123 parts of a sequence 123 workspace 73 Selection mode in the toolbar 71 Selection location on sequence 71 Self annealing 188 Self end annealing 188 Separate sequences on gel 236 using restriction enzymes 237 Sequence alignment 240 analysis 136 INDEX 282 display different information 61 extract from sequence list 132 information 127 information tutorial 37 join 156 layout 118 lists 130 logo 265 new 129 region types 126 search 122 select 123 shuffle 146 statistics 149 view 117 view as text 128 view circular 132 view format 61 web info 104 Sequence logo 246 247 Sequencing data 265 Sequencing primers 265 Shortcuts 74 Show hide Toolbox 72 Shuffle sequence 146 265 Side Panel location of 77 Signal peptide 265 SNP 167 annotation 167 265 Sort sequences 132 sequences alphabetically 250 sequences by similarity 250 Source element 96 Species display sequence species 61 Staden file format 29 86 270 Standard layout trees 259 Standard Settings CLC 79 Start Codon 168 Start up problems 19 Statistics about sequence 265 protein 152 sequence 149 Status Bar 72 73 illustration 58 str file format 86 Style sheet preferences 78 Support mail 11 svg format export 92 Swiss Prot file format 29 86 270 Swiss Prot TrEMBL 265 swp file format 86 System requirements 14 Tabs use of 64 TaqMan primers 265 tBLASTn 108 tBLASTx 108 Ter
145. ce group in the Side Panel to the right of the view see figure 15 3 Primer parameters Primer information Length v Show Max 22 Min 18 GC content Detailed Max 60 gt Min 40 gt Melt temp C Compact Max 58 gt Min 48 Inner Melt temp c Max Min Self annealing Max 18 2 Self end annealing Sequence layout Max es Secondary structure Max 16 gt 3 end G C restrictions gt Restriction sites gt Annotation layout Annotation types C 5 end GC restrictions gt Residue coloring Mode 2 Standard PCR TaqMan O Nested PCR gt Nucleotide info Search Text Format Sequencing Calculate Figure 15 3 The two groups of primer parameters in the program the Primer information group is listed below the other group 15 2 1 Primer Parameters In this preference group a number of criteria can be set which the selected primers must meet All the criteria concern single primers as primer pairs are not generated until the calculate button is pressed Parameters regarding primer and probe sets are described in detail for each reaction mode see below e Length Determines the length interval within which primers can be designed by setting a maximum and a minimum length The upper and lower lengths allowed by the program are 50 and 10 nucleotides respectively e GC content Determines the interval of CG content
146. cids alanine valine leucine and isoleucine An increase in the aliphatic index increases the thermostability of globular proteins The index is calculated by the following formula Aliphatic index X Ala ax X Val bx X Leu bx X Ile 12 1 X Ala X Val X lle and X Leu are the amino acid compositional fractions The constants a and b are the relative volume of valine a 2 9 and leucine isoleucine b 3 9 side chains compared to the side chain of alanine Ikai 1980 Estimated half life The half life of a protein is the time it takes for the protein pool of that particular protein to be reduced to the half The half life of proteins is highly dependent on the presence of the N terminal amino acid thus overall protein stability Bachmair et al 1986 Gonda et al 1989 Tobias et al 1991 The importance of the N terminal residues is generally known as the N end rule The N end rule and consequently the N terminal amino acid simply determines the half life of proteins The estimated half life of proteins have been investigated in mammals yeast and E coli see Table 12 2 If leucine is found N terminally in mammalian proteins the estimated half life is 5 5 hours Extinction coefficient This measure indicates how much light is absorbed by a protein at a particular wavelength The extinction coefficient is measured by UV spectrophotometry but can also be calculated The amino acid composition is important when calc
147. clcbio com tutorials 2 1 Tutorial Starting up the program This brief tutorial will take you through the most basic steps of working with CLC Gene Work bench The tutorial introduces the user interface demonstrates how to create a project and demonstrates how to import your own existing data into the program When you open CLC Gene Workbench for the first time the user interface looks like figure 2 1 At this stage the important issues are the Navigation Area and the View Area The Navigation Area to the left is where you keep all your data for use in the program Most analyses of CLC Gene Workbench require that the data is saved in the Navigation Area There are several ways to get data into the Navigation Area and this tutorial describes how to import existing data The View Area is the main area to the right This is where the data can be viewed In general a View is a display of a piece of data and the View Area can include several Views The Views are represented by tabs and can be organized e g by using drag and drop 2 1 1 Creating a project and a folder When CLC Gene Workbench is started there is one default project in the Navigation Area Create an additional project by File in the Menu Bar New Project or Ctrl R R on Mac Name the project Test and press Enter The data in the project can be further organized into folders Create a folder in the Test project by CHAPTER 2
148. ct select the project to export Export ES choose where to export to enter name of project Save You can export multiple files of the same type into formats other than CLC clc E g two DNA sequences can be exported in GenBank format select the elements to export by lt Ctrl gt click or lt Shift gt click Export ES choose where to export to choose GenBank gbk format enter name of project Save Export of dependent objects When exporting e g an alignment CLC Gene Workbench 2 0 can export all dependent objects l e the Sequences which the alignment is calculated from This way when sending your alignment with the dependent objects your colleagues can reproduce your findings with adjusted parameters if desired To export with dependent files select the element in Navigation Area File in Menu Bar Export with dependent objects enter name of project choose where to export to Save The result is a folder containing the exported file with dependent objects stored automatically in a folder on the desired location of your desk Export history To export an element s history select the element in Navigation Area Export 9 select History PDF pdf choose where to export to Save The entire history of the element is then exported in pdf format The CLC format CLC Gene Workbench keeps all bioinformatic data in the CLC format Compared to other formats the CLC format contains more informat
149. ction enzyme This can potentially generate a lot of sequence fragments When a restriction site is double clicked the recognition site is marked on the sequence and the cut this is marked by arrows Insert Sequence at this EcoRY Site 3C Cut this sequence at this EcoRY site Cut this sequence at all EcoR Y sites Cut all sequences at all EcoRY sites IB Figure 17 7 Right click on a restriction enzyme annotation in the cloning view When a sequence region between two restriction sites are double clicked the entire region will automatically be selected 17 2 5 Insert one sequence into another Sequences can be inserted into each other in several ways as described above When you chose to insert one sequence into another you will be presented with a dialog where all sequences in the view are present see figure 17 8 Select the sequence you want to insert and press ok If the ends do not fit a warning will be shown PBR322_30 3265 PBR322_1308 1527 AY738615 PERH168_1 35 IPERH1BB_32 109 IPERH2BB IPERH3B4 Figure 17 8 Select a sequence for insertion When the sequence is inserted it will be marked with a selection CHAPTER 17 CLONING AND CUTTING 229 a zn E O j 1 rr SS 2 scence Mataje GGGATATGAAGITGG TGGTEGGTOTAIATCGT Sequence Details CccTATACTTCAACC ACCACCAGATITAGCA I AF134224 ATGAAGTTG GGTGGTCTA TACTTCAAC CCACCAGAT Sequence Details Figure 17 9 One sequence is now inserted in
150. d e g because the elements are used in the same assignment or research project The word Element is used to refer to sequences saved searches lists folders etc In other words everything which can be stored in a project in the Navigation Area 3 1 1 Data structure Elements or data in CLC Gene Workbench 2 0 are stored in a kind of database Hence the data cannot be browsed from e g Windows Explorer or similar file systems However elements are available from the Navigation Area To open an element CHAPTER 3 USER INTERFACE 59 OO Default project for CLC user LL Example data 9 Nucleotide 5 Sequences 36 NM_000044 oe AY738615 29 HUMDINUC 20 PERH2BD 20 PERH3BC iZ sequence list Ht Assembly E3 Cloning project fj Primer design a Restriction analysis H E Protein EJ Extra B b Performed analyses a Gene Workbench E3 protein alignment Tc tree laz CAA32220 hydrophobicity H P68225 report ES Pattern Discovery a NP_058652 BLAST sof README Figure 3 2 The Navigation Area Double click the element or Click the element Show in the Toolbar Select the desired way to view the element This will open a View in the View Area which is described in the next section Adding data Data can be added to a project in a number of ways Files can be imported from the file system and elements from the Navigation Area can also be exported to the file system For more about
151. d under a Creative Commons Attribution NonCommercial CHAPTER 18 SEQUENCE ALIGNMENT 255 NoDerivs 2 5 License You are free to to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in it s original form and CLC bio has to be clearly labelled as author and provider of the work You may not use this work for commercial purposes You may not alter transform or build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more about how you may use the contents Chapter 19 Phylogenetic trees Contents 19 1 Inferring phylogenetic trees lt lt 256 191 1 Phylogenetic tree parameters 4 2 48 24 bee ed a eS 256 19 12 mee VIEW PICICVenCes 2 6 n da SS ee gow B G E Se 258 19 2 Bioinformatics explained phylogenetics lt lt lt 8 ee eee 259 19 2 1 The phylogenetic tree o o 4 260 19 2 2 Modern usage of phylogenies s cos os io Be as Se eo aS 260 19 2 3 Reconstructing phylogenies from molecular data 261 19 2 4 Interpreting phylogenies asea aiia E 262 CLC Gene Workbench 2 0 offers different ways of inferring phylogenetic trees The first part of this chapter will briefly explain the different ways of inferring trees in CLC Gene Workbench 2 0 The second part Bioinformatics explained will
152. de all relevant databases at NCBI The nr database is the most complete but also the most redundant database that can be searched Searches can be limited to less complete databases As an example when choosing pdb only sequences with a know structure are searched If homologous sequences are found to the query sequence these can be downloaded and opened with the 3D viewer of CLC Protein Workbench When choosing BLASTx or tBLASTx to conduct a search you get the option of selecting a translation table for the genetic code The standard genetic code is set as default This is particularly useful when working with organisms or organelles which have a genetic code that differs from the standard genetic code In Step 3 you can limit the BLAST search by adjusting the parameters seen in figure 10 3 Y BLAST Against NCBI Databases 1 Select sequences of same MEA EEE type 2 Set program parameters 3 Set input parameters Choose Parameters Limit by entrez query All organisms Choose filter Y Low Complexity Human Repeats Mask For Lookup Mask Lower Case Expect 10 Word Size 3 Matrix BLOSUM62 Gap Cost Existence 11 Extension 1 AO e Figure 10 3 Examples of different limitations which can be set before submitting a BLAST search The following description of BLAST search parameters is based on information from http www ncbi nlm nih gov BLAST blastcgihelp shtml e Limit by Ent
153. der can be done in two ways right click an element in the Navigation Area New New Folder or File New New Folder If a project or a folder is selected in the Navigation Area when adding a new folder the new folder is added at the bottom of the project or folder If an element is selected the new folder is added right below that element You can move the folder manually by selecting it and dragging it to the desired location 3 1 3 Multiselecting elements Multiselecting elements in the Navigation Area can be done in the following ways e Holding down the lt Ctrl gt key while clicking on multiple elements selects the elements that have been clicked e Selecting one element and selecting another element while holding down the lt Shift gt key selects all the elements listed between the two locations the two end locations included e Selecting one element and moving the curser with the arrow keys while holding down the lt Shift gt key enables you to increase the number of elements selected 3 1 4 Moving and copying elements Elements can be moved and copied in two ways using the copy cut and paste functions or using drag and drop Copy cut and paste elements Copies of elements folders and projects can be made with the copy paste function which can be applied in a number of ways select the files to copy right click one of the selected files Copy right click the location to insert files into
154. ding Frame dialog e Start Codon AUG Most commonly used start codon Any All start codons in genetic code Other Here you can specify a number of start codons separated by commas Both Strands Finds reading frames on both strands e Stop Codon included in Annotation The ORFs will be shown as annotations which can include the stop codon if this option is checked Open Ended Sequence Allows the ORF to start or end outside the sequence If the sequence studied is a part of a larger sequence it may be advantageous to allow the ORF to start or end outside the sequence e Genetic code translation table The translation tables are occasionally updated from NCBI The tables are not available in this printable version of the user manual Instead the tables are included in the Help menu in the Menu Bar under Background Information Minimum Length Specifies the minimum length for the ORFs to be found Using open reading frames for gene finding is a fairly simple approach which is likely to predict genes which are not real Setting a relatively high minimum length of the ORFs will reduce the number of false positive predictions but at the same time short genes may be missed see figure 13 9 Finding open reading frames is often a good first step in annotating sequences such as cloning vectors or bacterial genomes For eukaryotic genes ORF determination may not always be very helpful since the intron exon struc
155. dit enzymes button This displays a dialog with the enzymes currently in the list shown at the bottom and a list of available enzymes at the top To add more enzymes select them in the upper list and press the Add enzymes button Jh To remove enzymes select them in the list below and click the Remove enzymes button gt CHAPTER 11 VIEWING AND EDITING SEQUENCES 134 e Load enzymes button If you have previously created an enzyme list you can select this list by clicking the Load enzymes button You can filter the enzymes in the same way as illustrated in figure 17 13 e Add enzymes cutting the selection to panel If you make a selection on the sequence right click you find this option for adding enzymes Based on the entire list of available enzymes the enzymes cutting in the region you selected will be added to the list in the Side Panel e Insert restriction site before after selection If you make a selection on the sequence right click you find this option for inserting a restriction site before or after the region you selected A dialog is shown where you can select an enzyme whose recognition sequence is inserted If it was not already present in the list in the Side Panel the enzyme will now be added and selected Finally if you have selected a set of enzymes that you wish to keep for later use you can click Save enzymes and the selected enzymes will be saved to en enzyme list This list can then be used both when finding
156. e A 160 13 Nucleotide analyses 163 aS Convent DNA tO RNA sa 2 deere e ae hee Se eee ere ee ee we Sa See ee E 163 13 2 Convert RNA to DNA csa eae ee RR dae ee Re 164 13 3 Reverse complements of Sequences 2 2 2 eee ee ee ees 165 13 4 Translation of DNA or RNA to protein 2 2 2 0 eee es 166 13 5 Annotate with SNPs CONTENTS 13 6 Find Open reading Mames i 2 04 6 dike e ee DA Bee ae SS 14 Protein analyses 14 1 Protein charge 14 2 Hydrophobicity 14 3 Reverse translation from protein into DNA 002502 15 Primers 15 1 Primer deste a INtoOdUCUON a oia o a ok A a a a 15 2 Setting parameters for primers and probes 15 3 Graphical display of primer information a 15 4 Output from primer design soet sa roak e aoa a al 2 15 5 Standard PCR 15 6Nested PCR 15 7 TaqMan 15 8 Sequencing primers 15 9 Alignment based primer and probe design 15 1 nalyze primer properties ee es 15 1Match primer with sequence lt span et di ao A a 15 1Drder primers 16 Assembly 16 1 importing and viewing trace data a nk we wh eS are a ee oe a ee ee a 16 2 Trim sequences 16 3 Assemble sequences 16 4 Assemble to reference sequence 2 002 e ee ee es 116 SASSEMBIE to All existing CONTE ss ase ese aoe
157. e Project Tree at a time CHAPTER 3 USER INTERFACE 64 3 1 8 Sequence properties Sequences downloaded from databases have a number of properties which can be displayed using the Sequence Properties function Right click a sequence in the Navigation Area Properties This will show a dialog as shown in figure 3 5 Y Sequence Properties Type ns SIE ona Name HUMDINUC Source SOURCE Homo sapiens human ORGANISM Homo sapiens Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Catarrhini Hominidae Homo Description Human dinucleotide repeat polymorphism at the D115439 and HBB loci Keywords KEYWORDS dinucleotide repeat polymorphism Comments Original source text Homo sapiens DNA Last modified 06 MAY 1993 Figure 3 5 Sequence properties for the HUMDINUC sequence For a more comprehensive view of sequence information see section 11 2 3 2 View Area The View Area is the right hand part of the workbench interface displaying your current work The View Area may consist of one or more Views represented by tabs at the top of the View Area This is illustrated in figure 3 6 Notice e the tab concept is central to working with CLC Gene Workbench 2 0 because several operations can be performed by dragging the tab of a view and extended right click menus can be activated from the tabs This chapter deals with the handling of Views
158. e Zoom In mode toolbar item is selected zooms out instead of zooming in CHAPTER 3 USER INTERFACE 71 K amp AM Y 72 le we i Fit Width 100 Pan SOCET Zoom In Zoom Out Figure 3 13 The mode toolbar items 3 3 2 Zoom Out It is possible to zoom out step by step on a sequence Click Zoom Out in the toolbar click in the view until you reach a satisfying zoomlevel When you choose the Zoom In mode the mouse pointer changes to a magnifying glass to reflect the mouse mode If you want to get a quick overview of a sequence or a tree use the Fit Width function instead of the Zoom Out function If you press Shift while clicking in a View the zoom funtion is reversed Hence clicking on a sequence in this way while the Zoom Out mode toolbar item is selected zooms in instead of zooming out 3 3 3 Fit Width The Fith Width function adjusts the content of the View so that both ends of the sequence alignment or tree is visible in the View in question This function does not change the mode of the mouse pointer 3 3 4 Zoom to 100 The Zoom to 100 lt 4 _ function zooms the content of the View so that it is displayed with the highest degree of detail This function does not change the mode of the mouse pointer 3 3 5 Move The Move mode allows you to drag the content of a View E g if you are studying a sequence you can click anywhere in the sequence and hold the mouse button By moving the mouse you
159. e a matrix of pairwise distances between OTUs from their sequence differences To correct for multiple substitutions it is common to use distances corrected by a model of molecular evolution such as the Jukes Cantor model Jukes and Cantor 1969 UPGMA A simple but popular clustering algorithm for distance data is Unweighted Pair Group Method using Arithmetic averages UPGMA Michener and Sokal 1957 Sneath and Sokal 1973 This method works by initially having all sequences in separate clusters and continuously joining these The tree is constructed by considering all initial clusters as leaf nodes in the tree and each time two clusters are joined a node is added to the tree as the parent of the two chosen nodes The clusters to be joined are chosen as those with minimal pairwise distance The branch lengths are set corresponding to the distance between clusters which is calculated as the average distance between pairs of sequences in each cluster The algorithm assumes that the distance data has the so called molecular clock property i e the divergence of sequences occur at the same constant rate at all parts of the tree This means that the leaves of UPGMA trees all line up at the extant sequences and that a root is estimated as part of the procedure Neighbor Joining The neighbor joining algorithm Saitou and Nei 1987 on the other hand builds a tree where the evolutionary rates are free to differ in different lineages i e the tr
160. e difference in melting temperature between outer and inner primer pair the scoring function discounts primer sets which deviate greatly from this value Regarding this and the minimum difference option mentioned above please note that to ensure flexibility there is no directionality indicated when setting parameters for melting temperature differences between inner and outer primer pair i e it is not specified whether the inner pair should have a lower or higher Tm Instead this is determined by the allowed temperature intervals for inner and outer primers that are set in the primer parameters CHAPTER 15 PRIMERS 197 preference group in the side panel If a higher Tm of inner primers is desired choose a Tm interval for inner primers which has higher values than the interval for outer primers e Two radio buttons allowing the user to choose between a fast and an accurate algorithm for primer prediction 15 6 1 Nested PCR output table In nested PCR there are four primers in a solution forward outer primer FO forward inner primer Fl reverse inner primer RI and a reverse outer primer RO The output table can show primer pair combination parameters for all four combinations of primers and single primer parameters for all four primers in a solution see section on Standard PCR for an explanation of the available primer pair and single primer information The fragment length in this mode refers to the length of the PCR fragment
161. e layout Annotation types Annotation layout etc Several of these preference groups are present in more views E g Sequence layout is also present when an alignment is viewed The content of the different preference groups are described in connection to those chapters where the functionality is explained E g Sequence Layout View preferences are described in chapter 11 1 1 which is about editing options of a sequence view When you have adjusted a view of e g a sequence your settings can be saved in a so called style sheet When you open other sequences which you want to display in a similar way the saved style sheet can be applied These options are available in the top of the View preferences See figure 4 4 To manage style sheets click seen in figure 4 4 This opens a menu where the following options are available e Save Settings e Delete Settings e Apply Saved Settings Style sheets for the View preference differ between views Hence you can have e g three style sheets for sequences two for alignments and four for graphs To adjust which of the style sheets is default for e g an alignment go to the general Preferences Ctrl K on Mac CLC Standard Settings represents the way the program was set up when you first launched the program The remaining icons of figure 4 4 are used to Expand all preferences Collapse all preferences and Dock Undock Preferences Dock Undock Preferences is used when making
162. e license online CLC Gene Workbench also offers you an opportunity to manually activating your license key Step 3 of the license activation dialog provide a License number and an Activation Key By clicking Copy this information to the clipboard you can open an email editor and paste these two numbers into the mail If you email this content and a short explanation to support clcbio com we will send back a pre activated license key Also in all steps of the license dialog you have an option of resetting the license This will allow you to start over importing another license However information about which licenses were used on the computer is stored externally to prevent unauthorized use of demo licenses 1 4 4 Upgrading from a demo license to a commercial license If you are trying a demo of CLC Gene Workbenchand want to upgrade to a license that you have bought choose Upgrade license in the Help menu Then follow the description in section 1 4 3 1 5 About CLC Workbenches In November 2005 CLC bio released two Workbenches CLC Free Workbench and CLC Protein Workbench CLC Protein Workbench is developed from the free version giving it the well tested user friendliness and look amp feel However the CLC Protein Workbench includes a range of more advanced analyses In March 2006 CLC Gene Workbench and CLC Combined Workbench were added to the product portfolio of CLC bio Like CLC Protein Workbench CLC Gene Workbench builds on CLC Free
163. e nr Appendix C Formats for import and export C 1 List of bioinformatic data formats Below is a list of bioinformatic data formats i e formats for importing and exporting sequences alignments and trees File type Suffix File format used for Phylip Alignment phy alignments GCG Alignment msf alignments Clustal Alignment aln alignments Newick nwk trees FASTA fsa fasta sequences GenBank gbk gb gp Sequences GCG sequence gcg sequences only import PIR NBRF pir sequences only import Staden sdn sequences only import VectorNTI sequences only import DNAstrider str strider sequences Swiss Prot Swp protein sequences Lasergene sequence pro protein sequence only import Lasergene sequence seq nucleotide sequence only import Embl embl nucleotide sequences Nexus nxs nexus sequences trees alignments and sequence lists CLC clc sequences trees alignments reports etc Text txt all data in a textual format ABI Trace files only import AB1 Trace files only import SCF2 Trace files only import SCF3 Trace files only import Phred Trace files only import mmCIF Cif structure only import PDB pdb structure only import Preferences cpf CLC workbench preferences Notice that CLC Gene Workbench can import external files too This means that CLC Gene Workbench can import all files and display them in the Navigation Area while the above 270 APPENDIX C FORMATS
164. e residues using a gradient in the same way as described above Graph Displays sequence logo at the bottom of the alignment x Height Specifies the height of the sequence logo graph x Color The sequence logo can be displayed in black or Rasmol colors For protein alignments a polarity color scheme is also available where hydrophobic residues are shown in black color hydrophilic residues as green acidic residues as red and basic residues as blue e Conservation Displays the level of conservation at each position in the alignment Foreground color Colors the letters using a gradient where the right side color is used for highly conserved positions and the left side color is used for positions that are less conserved Background color Sets a background color of the residues using a gradient in the same way as described above Graph Displays the conservation level as a graph at the bottom of the alignment The bar default view show the conservation of all sequence positions The height of the graph reflects how conserved that particular position is in the alignment If one position is 100 conserved the graph will be shown in full height CHAPTER 18 SEQUENCE ALIGNMENT 247 x Height Specifies the height of the graph x Type The type of the graph Line plot Displays the graph as a line plot Bar plot Displays the graph as a bar plot Colors Displays the graph as a color bar using a gradient like the fo
165. e same time e Organism Text e Description Text e Modified Since Between 30 days and 10 years e Gene Location Genomic DNA RNA Mitochondrion or Chloroplast e Molecule Genomic DNA RNA mRNA or rRNA e Sequence Length Number for maximum or minimum length of the sequence e Gene Name Text The search parameters are the most recently used The All fields allows searches in all parameters in the NCBI database at the same time All fields also provide an opportunity to restrict a search to parameters which are not listed in the dialog E g writing gene Feature key AND mouse in All fields generates hits in the GenBank database which contains one or more genes and where mouse appears somewhere in GenBank file NB the Feature Key option is only available in GenBank when searching for nucleotide sequences For more information about CHAPTER 9 DATABASE SEARCH 103 how to use this syntax see http www ncbi nlm nih gov entrez query static help Summary_Matrices html Search_Fields_and_Qualifiers When you are satisfied with the parameters you have entered you can either Save search parameters or Start search When applying he Save search parameters option only the parameters are saved not the results of the search The search parameters can also be saved by dragging the tab of the Search view into the Navigation Area If you don t save the search the search parameters are saved in Search NCBI view until the
166. e the results see section 8 1 If not click Finish The result is seen in figure 18 12 18 4 1 How alignments are joined Alignments are joined by considering the sequence names in the individual alignments If two sequences from different alignments have identical names they are considered to have the same origin and are thus joined Consider the joining of alignments A and B If a sequence named in A and B is found in both A and B the spliced alignment will contain a sequence named in A and B which represents the characters from A and B joined in direct extension of each other If a sequence with the name in A not B is found in A but not in B the spliced alignment will contain a sequence named in A not B The first part of this sequence will contain CHAPTER 18 SEQUENCE ALIGNMENT 253 sequence A from alignment 1 sequence B from alignment 1 sequence A from alignment 2 sequence B from alignment 2 Figure 18 12 The joining of the alignments result in one alignment containing rows of sequences corresponding to the number of uniquely named sequences in the joined alignments the characters from A but since no sequence information is available from B a number of gap characters will be added to the end of the sequence corresponding to the number of residues in B Note that the function does not require that the individual alignments contain an equal number of sequences 18 5 Bioinformatics explained Multiple alignments Mu
167. e two ways to Zoom In The first way enables you to zoom in step by step on a sequence Click Zoom In 590 in the toolbar click the location in the view that you want to zoom in on or Click Zoom In 590 in the toolbar click and drag a box around a part of the view the view now zooms in on the part you selected CHAPTER 3 USER INTERFACE 70 we AY310318 PERH1BD PERH2BD AY310318 PERH3BA HUMDINUC ls PERH1BA 2 PERH2BA AF134224 100f AJ871593 AY310318 PERH3BC File Edit Search View Toolbox Workspace Help a E wel Pie ett E P68046 agranom 19 P68053 E GREN 19 P68225 i GK N 20 Y Spaces every 10 residues P68873 ATA ROCHEN 20 O No wrap P68228 DEKN P68231 ls 20 Auto wrap P68063 O Fixed wrap w Sequence layout P68945 MANTA He BET cc EEN 19 Consensus MVHLTAEEKN AVTGLWGKVN r Y Numbers on sequences Relative to 4 i Sequence Logo 5 i Follow selection Conservation Lock numbers Lock labels Sequence label P68225 Name P68873 SS P68228 C Show selection boxes P68046 Identical residues as dots Y Figure 3 12 A maximized View The function hides the Navigation Area and the Toolbox When you choose the Zoom In mode the mouse pointer changes to a magnifying glass to reflect the mouse mode If you press the Shift button on your keyboard while clicking in a View the zoom funtion is reversed Hence clicking on a sequence in this way while th
168. e work You may not use this work for commercial purposes You may not alter transform or build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more about how you may use the contents 12 1 4 Bioinformatics explained Scoring matrices Biological sequences have evolved throughout time and evolution has shown that not all changes to a biological sequence is equally likely to happen Certain amino acid substitutions change of one amino acid to another happen often whereas other substitutions are very rare For instance tryptophan W which is a relatively rare amino acid will only on very rare occasions mutate into a leucine L Based on evolution of proteins it became apparent that these changes or substitutions of amino acids can be modeled by a scoring matrix also refereed to as a substitution matrix See an example of a scoring matrix in table 12 1 This matrix lists the substitution scores of every single amino acid A score for an aligned amino acid pair is found at the intersection of the corresponding column and row For example the substitution score from an arginine R to CHAPTER 12 GENERAL SEQUENCE ANALYSES 143 Framashifti vs Franiashift3 Figure 12 8 This dot plot show various frame shifts in the sequence See text for details a lysine K is 2 The diagonal show scores for amino acids which have not changed Most substitutions changes have a negative sco
169. ea Toolbox in the Menu Bar Protein Analyses iy Create Hydrophobicity Plot Lz This opens a dialog The first step allows you to add or remove sequences Clicking Next takes you through to Step 2 which is displayed in figure 14 3 The Window size is the width of the window where the hydrophobicity is calculated The wider the window the less volatile the graph You can chose from a number of hydrophobicity scales which are further explained in section 14 2 3 Click Next if you wish to adjust how to handle the results See section 8 1 If not click Finish The result can be seen in figure 14 4 In CLC Gene Workbench 2 0 it is possible to change the layout of the hydrophobicity plot through the Side Panel The drop down menus are opened by clicking the black triangular arrows There are two kinds of view preferences The graph preferences and preferences for the kind of hydrophobicity scale used to calculate the graph e g Kyte Doolittle The Graph preferences include e Lock axis This will always show the axis even though the plot is zoomed to a detailed level CHAPTER 14 PROTEIN ANALYSES 175 Create Hydrophobicity Plot 1 Select protein sequences AAA 2 Set parameters Choose a number Window size 11 Choose hydrophobicity scale V Kyte Doolittle V Eisenberg V Engelman Hopp Woods Janin Rose Cornette O e ome e Xe Figure 14 3 St
170. ect repeats gt gt La ACDEFGHIACDEFGHIACDEFGHIACDEFGHI Inverted repeats gt gt ACDEFGHIIHGFEDCAACDEFGHIIHGFEDCA Figure 12 6 Direct and inverted repeats shown on an amino acid sequence generated for demonstration purposes Sequence inversions In dot plots you can see an inversion of sequence as contrary diagonal to the diagonal showing similarity In figure 12 9 you can see a dot plot window length is 3 with an inversion Low complexity regions Low complexity regions in sequences can be found as regions around the diagonal all obtaining a high score Low complexity regions are calculated from the redundancy of amino acids within a limited region Wootton and Federhen 1993 These are most often seen as short regions of only a few different amino acids In the middle of figure 12 10 is a square shows the low complexity region of this sequence Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in it s original form and CHAPTER 12 GENERAL SEQUENCE ANALYSES 142 RARUENCA VS SAC UAnce b v b bby Figure 12 7 The dot plot of a sequence showing repeated elements See also figure 12 6 CLC bio has to be clearly labelled as author and provider of th
171. ed Elements su CAA24102 im CAA32220 Figure 14 1 Choosing protein sequences to calculate protein charge sequence lists from the Project Tree You can perform the analysis on several protein sequences at a time This will result in one output graph showing protein charge graphs for the individual proteins Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish 14 1 1 Modifying the layout Figure 14 2 shows the electrical charges for three proteins In the Side Panel to the right you can modify the layout of the graph A v CAA24102 charge Protein charge 25 20 15 10 Charge 0 2 4 6 8 10 12 14 pH 15 CAA32220 CAA24102 is v Graph preferences V Lock axes Y Frame x axis at zero Y axis at zero Tick type outside Tick lines at none x Show as histogran CAA24102 CAA32220 Text format Figure 14 2 View of the protein charge Graph preferences The Graph preferences apply to the whole graph CHAPTER 14 PROTEIN ANALYSES 173 e Lock axis This will always show the axis even though the plot is zoomed to a detailed level e Frame Toggles the frame of the graph e X axis at zero Toggles the x axis at zero e Y axis at zero Toggles the y axis at zero e Tick type outside inside Tick lines at Shows a grid behind the graph no
172. ed easily by activating the calculations from the Side Panel for a sequence right click protein sequence in Navigation Area Show Sequence open Hydropho bicity info in Side Panel or double click protein sequence in Navigation Area Show Sequence open Hy drophobicity info in Side Panel These actions result in the view displayed in figure 14 5 r eanas corning Hydrophobicity gt Kyte Doolittle gt Cornette gt Engelman gt Eisenberg gt Rose gt Janin Hopp Woods b Tevt Formar Figure 14 5 The different available scales in Hydrophobicity info in CLC Gene Workbench 2 0 The level of hydrophobicity is calculated on the basis of the different scales The different scales add different values to each type of amino acid The hydrophobicity score is then calculated as the sum of the values in a window which is a particular range of the sequence The window length can be set from 5 to 25 residues The wider the window the less fluctuations in the hydrophobicity scores For more about the theory behind hydrophobicity see 14 2 3 In the following we will focus on the different ways that CLC Gene Workbench 2 0 offers to display the hydrophobicity scores We use Kyte Doolittle to explain the display of the scores but the different options are the same for all the scales Initially there are three options for displaying the hydrophobicity scores You can choose one two or all three options by selecting the box
173. ed to the Internet to complete this tutorial Start out by select protein NP_058652 in the Navigation Area Toolbox BLAST Search BLAST Against NCBI Databases In Step 1 you can choose which sequence to use as query sequence Since you have already chosen the sequence it is displayed in the Selected Elements list Click Next In Step 2 figure 2 18 choose the default BLAST program BLASTp Protein sequence against Protein database and select the Swiss Prot database in the Database drop down menu Click Next In the Limit by Entrez query in Step 3 choose Homo sapiens ORGN from the drop down menu to arrive at the search configuration seen in figure 2 19 Including this term limits the query to proteins of human origin Click Finish to accept the default parameter settings and begin the BLAST search The computer now contacts NCBI and places your query in the BLAST search queue After a short while the result is received and opened in a new view CHAPTER 2 TUTORIALS 40 9 BLAST Against NCBI Databases 1 Select sequences of same Setpoyenpaances A type 2 Set program parameters Choose Program and Database Program blastp Protein sequence against Protein database Database Genetic code 1 _ Previous next Figure 2 18 Choosing BLAST program and database 9 BLAST Against NCBI Databases 1 Select sequences of same type 2 Set program parameters 3 Set input parameters
174. ee does not have a particular root Some programs always draw trees with roots for practical reasons but for neighbor joining trees no particular biological hypothesis is postulated by the placement of the root The method works very much like UPGMA The main difference is that instead of using pairwise distance this method subtracts the distance to all other nodes from the pairwise distance This is done to take care of situations where the two closest nodes are not neighbors in the real tree The neighbor join algorithm is generally considered to be fairly good and is widely used Algorithms that improves its cubic time performance exist The improvement is only significant for quite large datasets Character based methods Whereas the distance based methods compress all sequence information into a single number CHAPTER 19 PHYLOGENETIC TREES 262 aft Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Homo sapiens human Homo sapiens human Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse 44 Homo sapiens human Peromyscus maniculatus deer mouse ad Peromyscus maniculatus deer mouse Equus caballus horse 100 Homo sapiens human Homo sapiens human Peromyscus maniculatus deer mouse 120 Peromyscus maniculatus deer mouse se Peromyscus maniculatus deer mouse 8 Equus caballus horse Homo sapi
175. efore after selection If you make a selection on the sequence right click you find this option for inserting a restriction site before or after the region you selected A dialog is shown where you can select an enzyme whose recognition sequence is inserted If it was not already present in the list in the Side Panel the enzyme will now be added and selected Finally if you have selected a set of enzymes that you wish to keep for later use you can click Save enzymes and the selected enzymes will be saved to en enzyme list This list can then be used both when finding restriction sites from the Toolbox or when viewing another sequence 17 2 3 How to navigate the cloning view The zoom function in the cloning view works on the individual sequence and not the entire view In that way you can show a long plasmid and short sequence fragments in the same view However Fit Width and Zoom to 100 4 apply to all the sequences in the view and can thus be used to reset different zoom levels of the individual sequences CHAPTER 17 CLONING AND CUTTING 225 17 2 4 Manipulate sequences All manipulations of sequences are done manually giving you full control over how the sequence is constructed Manipulations are done through right click menus which have three different appearances depending on where you click e Right click the name of the sequence to the left e Right click a selection e Right click a restriction site The three men
176. elements but add annotations to existing elements Blog o Log Output options Log handling Copy and save in new Folder Name HUMDINUC Description Found 5 reading frames Time Sun Jun 11 13 06 17 CEST 2006 PERHIEA Found 5 reading frames Sun Jun 11 13 06 17 CEST 2006 PERHIBB Found 5 reading frames Sun Jun 11 13 06 17 CEST 2006 PERH2BA PERH266 Found 4 reading frames Found reading frames Sun Jun 11 13 06 17 CEST 2006 Sun Jun 11 13 06 17 CEST 2006 PERHZBD PERH3BA Found 7 reading frames Found 3 reading frames Sun Jun 11 13 06 17 CEST 2006 Sun Jun 11 13 06 17 CEST 2006 PERH3EC Found 7 reading frames Sun Jun 11 13 06 17 CEST 2006 Figure 8 4 An example of a batch log when finding open reading frames Part Ill Bioinformatics 100 Chapter 9 Database search Contents 9 1 GenBank search 2 22 22 ie ee eee Re ee a 101 9 1 1 GenBank search options 2 00 eee ee tes 101 9 1 2 Handling of GenBank search results 103 9 2 Sequence web info 2 2 mao 104 92 1 Google SeqUe ee saros or e a we 105 a A Ge me Weds Sa ee SE Hel de St amp ae 105 9 2 3 PubMed References 0 000 ce eee ee et es 105 A MIP EGE ce es e Secs Sar Se eine bl A UR we Bde wee he a amp he eee ds 106 CLC Gene Workbench 2 0 offers differ
177. ence list fal Ee ta a ig Els El Figure 2 45 Selecting protein and dna sequences but the dialog automatically filters out the protein sequences 2 11 13 Drag elements to the Toolbox If you have selected e g some protein sequences in the Navigation Area that you wish to use for creating an alignment 2 11 14 Export elements while preserving history If you have created e g an alignment and wish to export it to a colleague with the detailed history of all the source sequences you can select the alignment and all the sequences for export There is however a much easier way to do this see figure 2 46 Select the alignment File Export with dependent elements Eg Search View Toolbox Workspace Help S show Ctrl O New Show T Close All Views Ctrl Shift 4 g Import Ctrl I ES Import VectorNTI Data ES Export Ctri E 22 Export with Dependent Elements P Page Setup Sy Ext Alt F4 Figure 2 46 Export with dependent elements in order to preserve the detailed history of an element This will export the alignment including all the source sequences in one clc file When your colleague import the alignment its detailed history is preserved CHAPTER 2 TUTORIALS 55 2 11 15 Avoid the mouse trap use keyboard shortcuts Many tasks can be performed without using the mouse When you do the same task again and again you can save some time by learning its shortcut key As an example you c
178. ence sequence 213 Assembly 265 tutorial 45 variance table 219 Atomic composition 154 Automatic parsing 87 Back up 90 Base pairs required for a match 206 required for mispriming 193 Basic concepts of use 20 Batch processing 97 265 log of 98 Bibliography 272 Binding site for primer 205 Bioinformatic data export 88 formats 85 270 BLAST 265 276 INDEX 277 against local Database 113 against NCBI 107 create database from file system 114 create database from Navigation Area 114 create local database 114 graphics output 111 list of databases 268 parameters 109 search 107 table output 112 tutorial 38 BLAST DNA sequence BLASTn 108 BLASTx 108 tBLASTx 108 BLAST Protein sequence BLASTp 108 tBLASTn 108 BLOSUM scoring matrices 142 Bootstrap values 262 Bug reporting 19 C G content 122 CDS translate to protein 124 Cheap end gaps 242 cif file format 86 Circular molecules 229 Circular view of sequence 132 265 clc file format 86 89 CLC Standard Settings 78 79 CLC Workbenches 18 CLC file format 29 86 270 Cloning 221 265 circular view 229 insert fragment 228 navigation 224 restriction enzymes 228 view preferences 223 Close View 65 Clustal file format 29 86 270 Coding sequence translate to protein 124 Codon frequency tables reverse translation 181 usage 182 Color residues 247 Compare workbenches 265 Complexity plot 147 Configure network
179. ences as well they will not be converted 13 3 Reverse complements of sequences CLC Gene Workbench 2 0 is able to create the reverse complement of a nucletide sequence By doing that a new sequence is created which also has all the annotations reversed since they now occupy the opposite strand of their previous location To quickly obtain the reverse complement of a sequence or part of a sequence you may select a region on the negative strand and open it in a new view right click a selection on the negative strand Open selection in a new view By doing that the sequence will be reversed This is only possible when the double stranded view option is enabled It is possible to copy the selection and paste it in a word processing program or an e mail To obtain a reverse complement of an entire sequence select a sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analyses 4 Create Reverse Complement x or right click a sequence in Navigation Area Toolbox Nucleotide Analyses A Create Reverse Complement x This opens the dialog displayed in figure 13 3 Y Create Reverse Complement 1 Select nucleotide MIES pa Projects Selected Elements LL Example data DOC PERH3BC B 63 Nucleotide ie Sequences lt GES 906 PERH2ED 206 HUMDINUC iE sequence list E Assembly e E3 Cloning project Ht Primer design Protein bra a i Performed analyses E README E CLC bio Home gt Next
180. ences by checking all selection boxes Specificity of priming is determined by criteria set by the user in the dialog box which is shown when the Calculate button is pressed see below Different options can be chosen concerning the match of the primer to the template sequences in the included group CHAPTER 15 PRIMERS 202 e Perfect match Specifies that the designed primers must have a perfect match to all relevant sequences in the alignment When selected primers will thus only be located in regions that are completely conserved within the sequences belonging to the included group e Allow degeneracy Designs primers that may include ambiguity characters where hetero geneities occur in the included template sequences The allowed fold of degeneracy is user defined and corresponds to the number of possible primer combinations formed by a degenerate primer Thus if a primer covers two 4 fold degenerate site and one 2 fold degenerate site the total fold of degeneracy is 4 x 4 2 32 and the primer will when supplied from the manufacturer consist of a mixture of 32 different oligonucleotides When scoring the available primers degenerate primers are given a penalty which increases with the fold of degeneracy e Allow mismatches Designs primers which are allowed a specified number of mismatches to the included template sequences The melting temperature algorithm employed includes the latest thermodynamic parameters for calculating T
181. enetic code reverse translation 181 Getting started 20 Google sequence 105 Graphics data formats 271 INDEX 279 export 91 Local BLAST Database 114 Local complexity plot 147 265 Half life 153 Local Database BLAST 113 Handling of results 97 Locale setting 77 Help 20 Location Hide show Toolbox 72 of selection on sequence 71 History 95 Side Panel 77 export 89 Log of batch processing 98 preserve when exporting 96 source elements 96 Hydrophobicity 174 265 Bioinformatics explained 177 Cornette 178 Eisenberg 178 Engelman GES 178 Hopp Woods 178 Janin 178 Kyte Doolittle 177 Rose 178 Import bioinformatic data 86 data from older versions 87 existing data 27 external files 90 FASTA data 27 list of formats 270 preferences 78 Vector NTI data 87 Infer Phylogenetic Tree 256 Insert gaps 249 restriction site 120 133 224 Installation 11 Isoelectric point 153 Join alignments 251 sequences 156 jJpg format export 92 Lasergene sequence protein file format 29 86 270 sequence file format 29 86 270 License 15 Linux installation 13 installation with RPM package 14 List of sequences 130 Load enzymes 120 133 224 Logo sequence 246 265 Mac OS X installation 13 Manipulate sequences 265 Manual format 24 Marker in gel view 238 Max Sequence length for BLAST 107 Maximize size of view 67 Maximum memory adjusting 22 Melting temperature 188 Cat
182. ens human 100 Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Homo sapiens human Homo sapiens human Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse so0f Homo sapiens human Homo sapiens human Figure 19 5 Algorithm choices for phylogenetic inference The top shows a tree found by the neighbor joining algorithm while the bottom shows a tree found by the UPGMA algorithm The latter algorithm assumes that the evolution occurs at a constant rate in different lineages the character based methods attempt to infer the phylogeny based on all the individual characters nucleotides or amino acids Parsimony In parsimony based methods a number of sites are defined which are informative about the topology of the tree Based on these the best topology is found by minimizing the number of substitutions needed to explain the informative sites Parsimony methods are not based on explicit evolutionary models Maximum Likelihood Maximum likelihood and Bayesian methods see below are probabilistic methods of inference Both have the pleasing properties of using explicit models of molecular evolution and allowing for rigorous statistical inference However both approaches are very computer intensive A stochastic model of molecular evolution is used to assign a probability likelinood to each phylogeny given
183. ensure that CLC Gene Workbench treats the sequence in the correct way Click DNA RNA OK CHAPTER 2 TUTORIALS 28 The sequence is imported into the project or folder that was selected in the Naviagation Area before you clicked Import Double click the sequence in the Navigation Area to view it The final result looks like figure 2 2 Y CLC Gene Workbench 2 0 Default BAR File Edit Search View Toolbox Workspace Help BE ae dale sort ci wat Bt wad MM HUMDINUC 3 EP Example data LL Test 5 Subfolder Y Sequence layout 200 HUMDINUC 20 l C Spaces every 10 residues HUMDINUC GTGCTATCCTCTTGCATT O No wrap Auto wrap 40 O Fixed wrap HUMDINUC TAGAGTTTAACTGGTACC ever residues C Double stranded E E Alignments and Trees Numbers on sequences E KA General Sequence Analyses 60 Hl EA Nucleotide Analyses l E a Protein Analyses HUMDINUC TACTTCCAAAAGGGAAAC fa Primers and Probes E fag Assembly Follow selection ye Cloning and Restriction Sites Relative to 1 Numbers on plus strand Lock numbers BLAST Search 80 E Database Search l Lock labels HUMDINUC AGAATTAGAAAAGAAAAT Processes Toolbox Sequence label E Ide Figure 2 2 The HUMDINUC file is imported and opened 2 1 3 Supported data formats CLC Gene Workbench can import and export the following formats CHAPTER 2 TUTORIALS 29 File type Suffix File format used for Phylip Alignment
184. ent ways of searching data on the Internet You must be online when initiating and performing the following searches 9 1 GenBank search This section describes searches in GenBank the NCBI Entrez database and the import of search results The NCBI search view is opened in this way figure 9 1 Search Search NCBI Entrez or Ctrl B B on Mac This opens the following view 9 1 1 GenBank search options Conducting a search in the NCBI Database from CLC Gene Workbench 2 0 corresponds to conducting the search on NCBI s website When conducting the search from CLC Gene Workbench 2 0 the results are available and ready to work with straight away You can choose whether you want to search for nucleotide sequences or protein Sequences As default CLC Gene Workbenchoffers one text field where the search parameters can be entered Click Add search parameters to add more parameters to your search 101 CHAPTER 9 DATABASE SEARCH 102 NCBI search a Choose database Nucleotide O Protein all Fields v human E 5 al Fields v hemoglobin E all Fields y complete 15 Add search parameters aw Start search C Append wildcard to search words Accession Definition Modification D BCo10230 Homo sapiens chromosome 10 open reading frame 83 mRNA cDNA clo 2004 03 25 A BC015537 Homo sapiens hemoglobin epsilon 1 mRNA cDNA clone MGC 9582 IM
185. enzymes button gt e Load enzymes button If you have previously created an enzyme list you can select this list by clicking the Load enzymes button You can filter the enzymes in the same way as illustrated in figure 17 13 e Add enzymes cutting the selection to panel If you make a selection on the sequence right click you find this option for adding enzymes Based on the entire list of available enzymes the enzymes cutting in the region you selected will be added to the list in the Side Panel e Insert restriction site before after selection If you make a selection on the sequence right click you find this option for inserting a restriction site before or after the region you CHAPTER 11 VIEWING AND EDITING SEQUENCES 121 selected A dialog is shown where you can select an enzyme whose recognition sequence is inserted If it was not already present in the list in the Side Panel the enzyme will now be added and selected Finally if you have selected a set of enzymes that you wish to keep for later use you can click Save enzymes and the selected enzymes will be saved to en enzyme list This list can then be used both when finding restriction sites from the Toolbox or when viewing another sequence Residue coloring These preferences make it possible to color both the residue letter and set a background color for the residue e Non standard residues For nucleotide sequences this will color the residues that are not C G
186. ep two in the Hydrophobicity Plot allows you to choose hydrophobicity scale and the window size led CAA32220 hydr Hydrophobicity plot of CAA32220 s 25 Graph preferences Lock axes Frame X axis at zero O Y axis at zero Tick type outside Y Tickiines at none Y Hydrophobicity Kyte Doolittle Engelman Eisenberg Eisenberg Kyte Engelman Doolittle Text format YAA AAA AA o 20 40 60 80 100 120 140 Position Figure 14 4 The result of the hydrophobicity plot calculation and the associated Side Panel e Frame Toggles the frame of the graph e X axis at zero Toggles the x axis at zero e Y axis at zero Toggles the y axis at zero e Tick type outside inside e Tick lines at Shows a grid behind the graph none major ticks e Show as histogram For some data series it is possible to see it as a histogram rather than a line plot The preferences for the different scales are identical and include the following CHAPTER 14 PROTEIN ANALYSES 176 e Dot type Lets you choose the marking of dots in the graph e Dot color Lets you choose the color of the dots e Line width Applies to the line connecting the dots e Line type Applies to the line connecting the dots e Line color Applies to the line connecting the dots 14 2 2 Hydrophobicity graphs along sequence Hydrophobicity graphs along sequence can be display
187. equence 1 eee es 29 2 3 Tutorial GenBank search and download 2 082 22 eee 31 2 3 1 Saving the Search ee es 32 2 3 2 Searching for matching objects o 32 233 Savings the SEQUENCE g a wicks Sh ok a a e a 32 2 4 Tutorial Align protein sequences 2 0 0 ee ee eee es 32 24 1 Alignment dialog lt e cc i a we A ee ew eee A Ew ee 33 2 5 Tutorial Create and modify a phylogenetic tree lt 34 ZOOL MECANO oi rt SU eg ra ee ee amp 35 2 6 Tutorial Detect restriction sites anaana aaa 35 2 6 1 View restriction site 0 es 36 2 7 Tutorial Sequence information lt 37 2 8 Tutorial BLAST Search i o ose poas a aaa a ek ee ee we 38 2 9 Tutorial Primer design 41 2 91 Finding the reglonto OMpPIY res ens s a g Se RR a ee eee A 41 2 9 2 Specifying a region for the forward primer o 41 2 9 3 Examining the primer suggestions lt 42 2 9 4 Calculating a primer pair o ee ee 44 2 10 Tutorial Assembly uu 4 4 45 2 10 1 Getting an overview of the contig 45 2 10 2 Finding and editing inconsistencies o 45 2 10 3 Documenting your changes es 46 2 11 Tips and tricks for the experienced user 2
188. er sequences at top CT CCT OA m N a Show sequence ends ACAN Sequence layout Annotation layout Annotation types gt Residue coloring Alignment info Nucleotide info gt Translation Figure 2 33 Using the Find Inconsistency button highlights inconsistencies 2 11 Tips and tricks for the experienced user In this tutorial you will get to know a number of ways to cut corners when using CLC Gene Workbench The following sections will show you how to get your tasks done quickly and easily When you are using the program it is hard to discover these shortcuts yourself which is the reason why this tutorial was written gt ai Figure 2 32 An overview of the contig with the coverage graph CHAPTER 2 TUTORIALS 48 CCAGIBACA Figure 2 34 Just press the key to replace the residue Ob Contig 1 Replaced a symbol Fri Jun 30 15 19 38 CEST 2006 A User Parameters Position 631 Removed G Inserted a Modified element Contig Comments No Comment Edit Replaced a symbol Fri Jun 30 15 19 36 CEST 2006 User Parameters Position 628 Removed G Inserted y Modified element Contig Comments No Comment Edit Replaced a symbol Fri Jun 30 15 19 30 CEST 2006 User Parameters Position 550 Removed T Inserted t Modified element Contig v Figure 2 35 The history of the contig showing that a T has been substituted for a t at position 550 The tutorial
189. er to use this application you will need a valid license key file Tf you already have a key file containing a valid license you can import it by clicking the import button below If you do not have a license you can request an evaluation license on line by clicking the request button below while being connected to the internet or by sending an email to license clcbio com Tf you experience any problems please contact support clcbio com Request evaluation license Import a license key file Figure 1 3 Selecting Request evaluation license Gene Workbench 2 0 Now our server will issue an evaluation license This process might take a while depending on your internet connection When the license key is received you will be asked to accept the License agreement shown in figure 1 4 Get license Accept agreement Activate license END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE a CLC Gene Workbench 2 0 3 1 Recitals 1 1 This End User License Agreement EULA is a legal agreement between you either an individual person or a single legal entity who willbe referred to in this EULA as You and CLC bio A S CVR no 28 30 50 87 for the software products that accompanies this EULA including any associated media printed materials and electronic documentation the Software Product 1 2 The Software Product also includes any software updates add on components web services andlor supplements
190. erences can be saved in a style sheet See section 4 5 The sequences can be sorted by clicking the column headings You can further refine the sorting by pressing Ctrl while clicking the heading of another column 11 5 3 Extract sequences It is possible to extract individual sequences from a sequence list in two ways If the sequence list is opened in the tabular view it is possible to drag with the mouse one or more sequences into the Navigation Area This allows you to extract specific sequences from the entire list Another option is to extract all sequences found in the list to a preferred location in the Navigation Area right click a sequence list in the Navigation Area Extract Sequences Select a location for the sequences and click OK Copies of all the sequences in the list are now placed in the location you selected 11 6 Circular DNA A sequence can be shown as a circular molecule select a sequence in the Navigation Area Show in the Toolbar Circular This will open a view of the molecule similar to the one in figure 11 10 This view of the sequence shares some of the properties of the linear view of sequences as described in section 11 1 but there are some differences The similarities and differences are listed below e Similarities The editing options Options for adding editing and removing annotations Annotation Layout Annotation Types and Text Format preferences groups e Differences CHA
191. erent codons Thus the program offers a number of choices for determining which codons should be used These choices are explained in this section In order to make a reverse translation Select a protein sequence Toolbox in the Menu Bar Protein Analyses egy Reverse Translate or right click a protein sequence Toolbox Protein Analyses E Reverse translate SA This opens the dialog displayed in figure 14 8 If a Sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree You can translate several protein sequences at a time Click Next to adjust the parameters for the translation 14 3 1 Reverse translation parameters Figure 14 9 shows the choices for making the translation e Most frequently used codon On the basis of the selected translation table this parame ter option will assign the codon that occurs most often When choosing this option the results of performing several reverse translations will always be the same contrary to the CHAPTER 14 PROTEIN ANALYSES 180 Y Reverse Translate 1 Select protein sequences Projects Selected Elements LL Example data Su CAA24102 a Nucleotide 23 Protein E 3D structures Se Sequences EEE Ss CAA32220 Ss NP_058652 Pu P68046 Ss P68053 us P68063 Su P68225 Se P68228 Ss P68231 Ss P68873 Su
192. es The CLC Workbenches are developed for Windows Mac and Linux platforms Data can be 20 INTRODUCTION TO CLC GENE WORKBENCH CHAPTER 1 exported imported between the different platforms in the same easy way as when export ing importing between two computers with e g Windows This is illustrated in figure 1 9 esse o MacOSX 4 Buu 4 4 4 4 4 a a a a a r A Search a FASTA a 5 a gt H Import 3 CLC Free A a R Workbench y s ram CLC Free F ee CLC Free H Workbench ar Workbench gt J S a Q 3 a a a sults e Protein a Comparative Phylogenetic a quences H protein Statisti tree Y Alignments H CLC Protein CLC Protein H Workbench mses Workbench A results s s s Search f Protein report Dot Plot similar Secondary structure prediction CLC Free Workbench CLC Free Workbench CLC Free Workbench Export z Generate report mal P FASTA ac Figure 1 9 An example of how research can be organized and how data can flow between users of different workbenches working on different platforms 1 6 When the program is installed Getting started CLC Gene Workbench 2 0 includes an extensive Help function which can be found in the Help menu of the program s Menu bar The Help function can also be launched by pressing F1 The help topics are sorted in a table of contents and the topics can be searched 1 6 1 Basic concepts of using CLC Workbenches
193. es See figure 14 6 Coloring the letters and their background When choosing coloring of letters or coloring of their background the color red is used to indicate high scores of hydrophobicity A color slider allows you to amplify the scores thereby emphasizing areas with high or low blue levels of hydrophobicity The color settings mentioned are default settings By clicking the color bar just below the color slider you get the option of changing color settings CHAPTER 14 PROTEIN ANALYSES 177 Residue coloring En y Hydrophobicity info e Kyte Doolittle Window length s A I CAA32220 RFFDKFGNLS SAQAIMGNPR Kyte Doolitie _ _ __e AAA Foreground color V Background color 80 1 CAA32220 IKAHGKKVLT SLGLAVKNMD Min Max v Graph Kyte Doolittle Height low v Line plot v 100 gt Cornette l CAA32220 NLKETFAHLS ELHCDKLHVD Bj engeiman gt gt Eisenberg a Figure 14 6 The different ways of displaying the hydrophobicity scores using the Kyte Doolittle scale Graphs along sequences When selecting graphs you choose to display the hydrophobicity scores underneath the sequence This can be done either by a line plot or bar plot or by coloring The latter option offers you the same possibilities of amplifying the scores as applies for coloring of letters The different ways to display the scores when choosing graphs are displa
194. es Export Choose location for the exported file Enter name of file Save Notice The format of exported preferences is cpf This notation must be submitted to the name of the exported file in order for the exported file to work Notice Before exporting you are asked about which of the different settings you want to include in the exported file Default View Settings Sheet which is one of the preferences which can be selected for export does not include the Style sheets themselves but only information about which of the Style sheets is default style sheets The process of importing preferences is similar to exporting Press Ctrl K 3 on Mac to open Preferences Import Browse to and select the cpf file Import and apply preferences 4 5 View preference style sheet Depending on which view you have opened in the Workbench you have different options of adjusting the View preferences Figure 4 2 shows the preference groups which are available for a sequence By clicking the black triangles the different preference groups can be opened An example is shown in figure 4 3 CHAPTER 4 USER PREFERENCES 79 e a Sequence layout Annotation layout gt Annotation types gt Restriction sites Residue coloring gt Nucleotide info gt Search Text Format Figure 4 2 View preferences for a view of a sequence include several preference groups In this case the groups are Sequenc
195. es Urasil for T residues Thymine select an RNA sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analyses A Convert RNA to DNA 34 or right click a sequence in Navigation Area Toolbox Nucleotide Analyses A Convert RNA to DNA 34 This opens the dialog displayed in figure 13 2 9 Convert RNA to DNA 1 Select RNA sequences Projects Selected Elements LL Example data Ed Rna sequence S E Nucleotide S ER Sequences 206 PERH3BC 20 PERH2BD 20 HUMDINUC HZ sequence list xc w Assembly Cloning project Primer design a Restriction analysis EE Protein B Extra Performed analyses E README CLC bio Home a Figure 13 2 Translating RNA to DNA If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish This will open a new view in the View Area displaying the new DNA sequence The new sequence is not saved automatically To save the protein sequence drag it into the Navigation Area or CHAPTER 13 NUCLEOTIDE ANALYSES 165 press Ctrl S S on Mac to activate a save dialog Notice You can select multiple RNA sequences and sequence lists at a time If the sequence list contains DNA sequ
196. es 0 38 v Lock axes 0 96 v Frame 0 94 0 92 Tick type outside Y a 0 90 Tick lines at none v 0 88 o v Local complexity a 0 86 Dot type none v Dot color A 0 84 Line width medium v 0 82 Line type line v Line color 0 80 Local gt Text Format 0 78 complexity 5 10 15 20 25 30 35 40 45 Position Figure 12 13 An example of a local complexity plot 12 3 1 Local complexity view preferences There are two groups of preferences for the local complexity view Graph preferences and Local complexity preferences The Graph preferences apply to the whole graph e Lock axis This will always show the axis even though the plot is zoomed to a detailed level e Frame Toggles the frame of the graph e X axis at zero Toggles the x axis at zero e Y axis at zero Toggles the y axis at zero e Tick type outside inside e Tick lines at Shows a grid behind the graph none major ticks e Show as histogram For some data series it is possible to see it as a histogram rather than a line plot CHAPTER 12 GENERAL SEQUENCE ANALYSES 149 The Local complexity preferences include Dot type none cross plus square diamond circle triangle reverse triangle dot e Dot color Allows you to choose between many different colors Line width thin medium wide e Line type none line long dash short dash e Line color Allows you to c
197. et the requirements and is colored red If you raise the maximum annealing temperature to 59 the primer will meet the requirements and the dot becomes green By adjusting the Primer parameters you can define primers which match your specific needs Since the dots are constantly updated you can immediately see how a change in the primer parameters affects the number of red and green dots CHAPTER 2 TUTORIALS 44 2 9 4 Calculating a primer pair Until now we have been looking at the forward primer To mark a region for the reverse primer make a selection covering approximately 40 residues downstream of the conflict annotations and Right click the selection Reverse primer region here The two regions should now be located as shown in figure 2 28 1850 1900 1950 I Forw ard pana re Pp Figure 2 28 A forward and a reverse primer region enclosing the conflicts Now you can let CLC Gene Workbench calculate all the possible primer pairs based on the Primer parameters that you have defined Click the Calculate button Modify parameters regarding the combination of the primers for now just leave them unchanged Calculate This will open a table showing the possible combinations of primers To the right you can specify the information you want to display e g showing secondary structure see figure 2 29 Clicking a primer pair in the table will make a corresponding selection on the sequence in the view above
198. f data in the View and they are described in the relevant sections about sequences alignments trees etc Side Panel are activated in this way select the View Ctrl U 36 U on Mac or right click the tab of the View View Show Hide Side Panel a Notice Changes made to the Side Panel will not be saved when you save the View See how to save the changes in the Side Panel in chapter 4 The Side Panel consists of a number of groups of preferences depending on the kind of data CHAPTER 3 USER INTERFACE 69 2 AY310318 S HBB AY310318 A v Te AJ871593 a wy L PERH2BB A PERH3BA HUMDINUC PERH1BA 984 PERH2BA AF134224 100 AJ871593 AY310318 PERH3BC v A gt Figure 3 10 A horizontal split screen The two Views split the View Area being viewed which can be expanded and collapsed by clicking the header of the group You can also expand or collapse all the groups by clicking the icons at the top 3 3 Zoom and selection in View Area The mode toolbar items in the right side of the Toolbar apply to the function of the mouse pointer When e g Zoom Out is selected the Zoom Out function is applied each time you click in a View where zooming is relevant texts tables and lists cannot be zoomed The chosen mode is active until another mode toolbar item is selected Fit Width and Zoom to 100 do not apply to the mouse pointer 3 3 1 Zoom In There ar
199. f the table to add or edit the comments The comments in the table are associated with the conflict annotation on the contig Therefore the comments you enter in the table will also be attached to the annotation on the contig sequence the comments can be displayed by placing the mouse cursor on the annotation for one second The comments are saved when you save the contig By clicking a row in the table the corresponding position is highlighted in the graphical view of the contig Clicking the rows of the table is another way of navigating the contig apart from using the Find Inconsistencies button or using the Space bar You can use the up and down arrow keys to navigate the rows of the table CHAPTER 16 ASSEMBLY 220 gt Contig 1 amp 620 640 l Contig ATCTCCTGAGGAGGTCAGIT GAAACACAGGG Trace of 1041063818107 scf ATCTCCTGAAGAAGTCAGBIGAAACA Wy Trace data Trace of 1041063818147 scf Trace data Trace of 1041063818160 scf Trace data FES contig 1 Contig ambiguities Number of rows 5 Position Contig Residue Other Residues Note 550 IT C 1 T 2 628 lG A 1 G 2 631 A 1 G 2 E TA Possible SNP c 2 T t Figure 16 9 The graphical view of a contig is displayed at the top At the bottom the conflicts are shown in a table Chapter 17 Cloning and cutting Contents 17 1 Molecular cloning an introduction lt lt
200. f the following functions e NCBI Opens the corresponding sequence s at GenBank at NCBI Here is stored additional information regarding the selected sequence s The default Internet browser is used for this purpose e Open sequence Opens the selected sequence s in one or more sequence views e Save sequence Downloads and saves the sequence without opening it e Open structure If the hit sequence contain structure information the sequence is opened in a text view or a 3D view 3D view in CLC Protein Workbench and CLC Combined Workbench 10 2 BLAST Against Local Database CLC Gene Workbench will let you conduct a BLAST search in a local database See section 10 3 for more about how to create a database The advantage of conducting a local BLAST search is the speed and that it is possible to BLAST sequences longer than 8900 residues To conduct a Local BLAST search right click the tab of an open sequence Toolbox BLAST Search BLAST Against Local Databases 2 or click an element in the Navigation Area Toolbox BLAST Search 3 BLAST Against Local Databases 2 This opens the dialog seen in figure 10 6 Click Next This opens the dialog seen in figure 10 7 In Step 2 you can choose between different BLAST methods See section 10 1 for information about these methods In step 2 you can also choose which of your local BLAST databases you want to conduct the search in Clicking Select Database opens the di
201. f the workspace is visible In the background are three quick start shortcuts which will help you getting started These can be seen in figure 1 11 Figure 1 11 Three available Quick start short cuts available in the background of the workspace The function of the three quick start shortcuts is explained here e Import data Opens the Import dialog which you let you browse for and import data from your file system e New sequence Opens a dialog which allows you to enter your own sequence e Read tutorials Opens the tutorials a menu with a number of tutorials These are also available from the Help menu in the Menu bar It might be easier to understand the logic of the program by trying to do simple operations on existing data Therefore CLC Gene Workbench 2 0 includes an example data set which can be found on our web page or downloaded from the program Also found in the Help menu CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 22 1 6 3 Import of example data When downloading CLC Gene Workbench 2 0 you are asked if you would like to import an example data set If you accept the data is downloaded automatically and saved in the program If you didn t download the data or for some other reason need to download the data again you have two options You can click ey Install example data in the Help menu of the program This installs the data automatically You can also go to our website at http www clcbio com Software CLC F
202. further information about the error or for helping you with the problem Notice that no personal information is send via the error report Only the information which can be seen in the Program Error Submission Dialog is submitted You can also write an e mail to Support clcbio com Remember to specify how the program error can be reproduced All errors will be treated seriously and with gratitude We appreciate your help Start in safe mode If the program becomes unstable on start up you can start it in Safe mode This is done be pressing down the Shift button while the program starts When starting in safe mode the user settings e g the settings in the Side Panel are deleted and cannot be restored Your data stored in the Navigation Area is not deleted 1 5 3 Free vs commercial workbenches The advanced analyses of the commercial workbenches CLC Protein Workbench and CLC Gene Workbench are not present in CLC Free Workbench Likewise some advanced analyses are available in CLC Gene Workbench but not in CLC Protein Workbench and visa versa All types of basic and advanced analyses are available in CLC Combined Workbench However the output of the commercial workbenches can be viewed in all other workbenches This allows you to share the result of your advanced analyses from e g CLC Combined Workbench with people working with e g CLC Free Workbench They will be able to view the results of your analyses but not redo the analys
203. g CHAPTER 16 ASSEMBLY 211 9 Trim Sequences 1 Select nucleotide Sea GENE sequences 2 Set parameters Sequence trimming V Ignore existing trim information 2 Trim using quality scores Limit 0 05 V Trim using ambiguous nucleotides Residues 2 Vector trimming Trim contamination from vectors Trim contamination from other sequences g Hit limit moderate 0 4 _ Previous Pnet Finish YK Cancel Figure 16 3 Setting parameters for trimming Remove old trimming lf you have previously trimmed the sequences you can check this to remove existing trimming annotation prior to analysis Trim using quality scores If the sequence files contain quality scores from a base caller algorithm this information can be used for trimming sequence ends The program uses the modified Mott trimming algorithm for this purpose Richard Mott personal communication Trim using ambiguous nucleotides This option trims the sequence ends based on the presence of ambiguous nucleotides typically N Note that the automated sequencer generating the data must be set to output ambiguous nucleotides in order for this option to apply The algorithm takes as input the maximal number of ambiguous nucleotides allowed after trimming If this maximum is set to e g 3 the algorithm finds the maximum length region containing 3 or fewer ambiguities and then trims away the ends not i
204. g a sequence alignment search report etc is saved where it is dropped If the element already exists you are asked whether you want to save a copy You drag from the View Area by dragging the tab of the desired element Use of drag and drop is supported throughout the program Further description of the function is found in connection with the relevant functions 3 1 5 Change element names This section describes two ways of changing the names of sequences in the Navigation Area In the first part the sequences themselves are not changed it s their representation that changes The second part describes how to change the name of the element Change how sequences are displayed Sequence elements can be displayed in the Navigation Area with different types of information e Name this is the default information to be shown e Accession sequences downloaded from databases like GenBank have an accession number e Species CHAPTER 3 USER INTERFACE 62 e Species accession e Common Species e Common Species accession Whether sequences can be displayed with this information depends on their origin Sequences that you have created yourself or imported might not include this information and you will only be able to see them represented by their name However sequences downloaded from databases like GenBank will include this information To change how sequences are displayed right click any element or folder in the
205. ged amino acids Nuclear proteins often bind to the negatively charged DNA which may regulate gene expression or help to fold the DNA Nuclear proteins often have a low percentage of aromatic residues Andrade et al 1998 Amino acid distribution Amino acids are the basic components of proteins The amino acid distribution in a protein is simply the percentage of the different amino acids represented in a particular protein of interest Amino acid composition is generally conserved through family classes in different organisms which can be useful when studying a particular protein or enzymes across species borders Another interesting observation is that amino acid composition variate slightly between proteins from different subcellular localizations This fact has been used in several computational methods used for prediction of subcellular localization Annotation table This table provides an overview of all the different annotations associated with the sequence and their incidence Dipeptide distribution This measure is simply a count or frequency of all the observed adjacent pairs of amino acids dipeptides found in the protein It is only possible to report neighboring amino acids Knowledge on dipeptide composition have previously been used for prediction of subcellular localization Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License
206. gning sequences ANAAANAANAN 100 E I gt Processes Toolbox ea Aligning sequences M M iM H H H Figure 3 14 Two running and a number of terminated processes in the Toolbox If you close the program while there are running processes a dialog will ask if you are sure that you want to close the program Closing the program will stop the process and it cannot be restarted when you open the program again 3 4 2 Toolbox The content of the Toolbox tab in the Toolbox corresponds to Toolbox in the Menu Bar The Toolbox can be hidden so that the Navigation Area is enlarged and thereby displays more elements View Show Hide Toolbox The tools in the toolbox can be accessed by double clicking or by dragging elements from the Navigation Area to an item in the Toolbox CHAPTER 3 USER INTERFACE 73 3 4 3 Status Bar As can be seen from figure 3 1 the Status Bar is located at the bottom of the window In the left side of the bar is an indication of whether the computer is making calculations or whether it is idle The right side of the Status Bar indicates the range of the selection of a sequence See chapter 3 3 6 for more about the Selection mode button 3 5 Workspace If you are working on a project and have arranged the views for this project you can save this arrangement using Workspaces A Workspace remembers the way you have arranged the views and you can switch between different workspaces
207. gnment algorithms but using the cheap end gaps option large gaps will generally be tolerated at the sequence ends improving the overall alignment This is the default setting of the algorithm Finally treating end gaps like any other gaps is the best option when you know that there are no biologically distinct effects at the ends of the sequences Figures 18 3 and 18 4 illustrate the differences between the different gap scores at the sequence ends 18 1 2 Fast or accurate alignment algorithm CLC Gene Workbench has two algorithms for calculating alignments e Accurate alignment This is the recommended choice unless you find the processing time too long e Fast alignment This allows for use of an optimized alignment algorithm which is very fast The fast option is particularly useful for datasets with very long sequences CHAPTER 18 SEQUENCE ALIGNMENT 243 20 P49342 MNPTETKAM MSQQMECPHE PNRIRIIHRIRQS METE ERESO STRESMMHER P20810 1 MNP TETRAN MsoomBcr HB PNEKKHKEKOA METE ERASO STRESMMHEK P27321 1 FRIESER P08855 MNPABARA Mr MsKEMECPHP HSRRBHRROB ARTEPBR sQ STEP ADHERA P12675 MNPTETKA MP MSKQMBECPHS PNEKRHKKOA METE EKKSO STKPSMMHEK P20811 1 MNPTEARA METE EKKPO SSKPSMMHER Q95208 MNPTBAKAM csSKOMECPHS PNEKRHKKOA METE BRKESO STKPSMMHER 20 P49342 MNPTETRAM MSQQMECPHM PNRRRARRNGA METE ERKSO STRESMMHER P20810 MNPTETRAM MSQQMBCPHIM PNEKKHEKOA MATA ERASOA STRESMMHERN P27
208. gnment is created select one or more gaps or residues in the alignment drag the selection to move This can be done both for single sequences but also for multiple sequences by making a selection covering more than one sequence When you have made the selection the mouse pointer turns into a horizontal arrow indicating that the selection can be moved see figure 18 9 Notice Residues can only be moved when they are next to a gap AGG GAGTCAT AGG GAGTCAT AGG GAGTCAT AGG GAGTCAT AGG GAGCAGT AGG GAGCAGT AGG GTACAGT AGG GTACAGT tacc GANG TaGcc GATAGC G amp G TAGC GAGTAGG GA G TAGG ATG GTGCACC ATG GTGCACC ATG GTGCATC ATG GTGCATC Figure 18 9 Moving a part of an alignment Notice the change of mouse pointer to a horizontal arrow 18 3 2 Insert gap columns The placement of gaps in the alignment can be changed by modifying the parameters when creating the alignment However gaps can also be added manually after the alignment is created To insert extra gap columns i e gaps in all the sequences select a part of the alignment right click the selection Add gap columns before after If you have made a selection covering e g five residues a gap of five will be inserted In this way you can easily control the number of gaps to insert CHAPTER 18 SEQUENCE ALIGNMENT 250 18 3 3 Delete residues and gaps Residues or gaps can be deleted for individual sequences or for the whole alignment For ind
209. gure 17 2 v Sequence details Show gt Sequence layout Annotation layout gt Annotation types v Restriction sites Show Done E Y BamHI GGATCC MA Y bait acatcr LE Y EcoRI GAATTC M Y tcory aatarc E Mdina aacctt MA Y pst ctecac EJ E salt atceac E Y Smar cccace MA Y xbat retrasa Xhol CTCGAG Select all Deselect all Edit enzymes Save enzymes Load enzymes Figure 17 2 The view preferences for cloning view Sequence details When you make a selection on the sequence you will see details of the residues and restriction sites as illustrated in figure 17 3 Ls AGCTGCACIGTGIGATCC GAGG TCGACGTGCACCTAG CTCCCTAG Figure 17 3 Sequence details of a selection At the top the sequence is zoomed out and represented as a black line with annotations and below the residues are shown double stranded with detailed visualization of restriction sites The Sequence details are particularly useful when the sequences have overhangs as shown at the right side end of the sequence in figure 17 3 which has a CTAG overhang CHAPTER 17 CLONING AND CUTTING 224 If you have not made a selection the details of the ends of the sequences will automatically be shown Restriction sites These preferences allow you to display restriction sites on the sequence There is a list of enzymes which are represented by different colors By selecting or deselecting the enzymes in the l
210. h to adjust how to handle the results see section 8 1 If not click Finish This will start the trimming process Views of each trimmed sequence will be shown and you can inspect the result by looking at the Trim annotations they are colored red as default If there are no trim annotations the sequence has not been trimmed 16 3 Assemble sequences This section describes how to assemble a number of sequence reads into a contig without the use of a reference sequence a known sequence that can be used for comparison with the other sequences see section 16 4 To perform the assembly select sequences to assemble Toolbox in the Menu Bar Assembly 3 Assemble Sequences This opens a dialog where you can alter your choice of sequences which you want to assemble You can also add sequence lists When the sequences are selected click Next This will show the dialog in figure 16 4 9 Assemble Sequences 1 Select at least two MEE nucleotide sequences 2 Set parameters Trimming Trim sequences Alignment options Minimum aligned read length 50 Alignment stringency Medium Y Conflicts Vote A C G T Unknown nucleotide N Ambiguity nucleotides R Y etc Output options Show view of both contigs and reads g Show only contig sequences O er a Figure 16 4 Setting assembly parameters This dialog gives you the following options for assembling e Trim seque
211. hange the Sequence label in the Sequence Layout view preferences you will have to ask the program to sort the sequences again The sequences can also be sorted by similarity grouping similar sequences together Right click the label of a sequence Sort Sequences by Similarity 18 3 6 Delete and add sequences Sequences can be removed from the alignment by right clicking the label of a sequence right click label Delete Sequence CHAPTER 18 SEQUENCE ALIGNMENT 251 This can be undone by clicking Undo in the Toolbar Extra sequences can be added to the alignment by creating a new alignment where you select the current alignment and the extra sequences see section 18 1 The same procedure can be used for joining two alignments 18 3 7 Realign selection If you have created an alignment it is possible to realign a part of it leaving the rest of the alignment unchanged select a part of the alignment to realign right click the selection Realign selection This will open Step 2 in the Create alignment dialog allowing you to set the parameters for the realignment see section 18 1 It is possible for an alignment to become shorter or longer as a result of the realignment of a region This is because gaps may have to be inserted in or deleted from the sequences not selected for realignment This will only occur for entire columns of gaps in these sequences ensuring that their relative alignment is unchanged Rea
212. he following all of these will be referred to as regions Regions are generally illustrated by markings often arrows on the sequences An arrow pointing to the right indicates that the corresponding region is located on the positive strand of the sequence Figure 11 3 is an example of three regions with separate colors Figure 11 4 shows an artificial sequence with all the different kinds of regions CHAPTER 11 VIEWING AND EDITING SEQUENCES 127 Figure 11 3 Three regions on a human beta globin DNA sequence HUMHBB Gene cLccbccL_ce LCCLCeCcL_ccet 60 Gene LCGCLCCLCCL CCLCCLCCLC Gene CcCLCCLCCLC CLCCLCCLCC 160 l e CLCCLCCLCC LCCLCCLCCL Genel LCCLCCLCCL CCLCCLCCLC 35500 36000 1 20 Gene 80 Gene CLCCLCCLCC 120 ECCECCLCCL 180 Gen l CCLCCLCCLC 220 CLCCLCCLCC 280 Gen CCLCCLCCLC 40 e Gene CLCCcLCccLce LCCLCCLCCL cc 100 Gene LCCLCCLCCL CCLCCLCCLC CL 140 Gene Gene CCLCCLCCLC CLCCLCCLCC LC 200 CLCCLCCLCC LCCCCCLCCL cc 240 260 Genel LCCLCCLCCL CCLCCLCCLC CL 300 CCLCCLCCLC CCLCCLCCLC CCLCCLCCLC CCLCCLCCLC CCLCCLCCLC CC Figure 11 4 Region 1 A single residue Region 2 A range of residues including both endpoints Region 3 A range of residues starting somewhere before 30 and continuing up to and including 40 Region 4 A single residue somewhere between 50 and 60 inclusive Region 5 A range of residues beginning somewhere be
213. he annotations that you want to copy right click the annotation Copy Annotation to other Sequences AC Select Annotation TGTGT T Open Annotation in New Viewer Edit Annotation Remove Annotation ona Translate CDS ORF TGTGT T Remove Annotations of This Type Remove All Annotations Copy Annotation to Other Sequences Set Numbers Relative to This Annotation Figure 2 41 Copying annotation to other sequences in the alignment A dialog listing all the sequences in the alignment is shown The annotation will be copied to the sequences that you select in this dialog If the sequences are not identical the annotation will still be copied 2 11 8 Get overview and detail of a sequence at the same time If you have a large sequence and you want to be able to get an overview of the whole and still keep the details of the residues you can use the Split views functionality In the example below figure 2 42 the end of the red annotation is examined in detail in the bottom view and in the upper view you have the overview of the whole alignment In this example a selection was made in the upper view and the bottom view automatically scrolls to display this selection this behavior can be turned off by unchecking the Follow selection option in the Side Panel CHAPTER 2 TUTORIALS 52 PEE alignment PERH3BC e o a 1 HUMDINUC o AJ871593 AY310318 AA l PEE alignment PERH3BC TCTAG TTT HUMDINUC
214. her analyses right click the label of contig to the left Open Sequence in New View Save 5 the new sequence This will generate a new nucleotide sequence which can be used for e g BLAST analysis or cloning construction In order to preserve the history of the changes you have made to the contig the contig itself should be saved from the contig view using either the save button or by dragging it to the Navigation Area 16 6 3 Assembly variance table In addition to the standard graphical display of a contig as described above you can also see a tabular overview of the conflicts in the contig right click the tab of the contig Show Table E3 This will display a new view of the conflicts as shown in figure 16 9 The table has the following columns e Position The position of the conflict measured from the starting point of the contig sequence e Contig Residue The contig s residue at this position The residue can be edited in the graphical view of the contig as described above e Other Residues Lists the residues of the reads Inside the brackets you can see the number of reads having this residue at this position In the example in figure 16 9 you can see that there is a C in the top read in the graphical view The other two reads have a T Therefore the table displays the following text C 1 T 2 e Note Can be used for your own comments on this conflict Right click in this cell o
215. hoose between many different colors 12 4 Sequence statistics CLC Gene Workbench 2 0 can produce an output with many relevant statistics for protein sequences Some of the statistics are also relevant to produce for DNA sequences Therefore this section deals with both types of statistics The required steps for producing the statistics are the same To create a statistic for the sequence do the following select sequence s Toolbox in the Menu Bar General Sequence Analyses A Create Sequence Statistics This opens a dialog where you can alter your choice of sequences which you want to create statistics for You can also add sequence lists Notice You cannot create statistics for DNA and protein sequences at the same time When the sequences are selected click Next CHAPTER 12 GENERAL SEQUENCE ANALYSES 150 9 Create Sequence Statistics 1 Select Sequences of Same O 2 Set parameters Choose Layout O Individual Statistics Layout Comparative Statistics Layout Background Distribution Background Distribution Calculated from 0 4 _ Previous mex Y rFrish YK Cancel Figure 12 14 Setting parameters for the sequence statistics This opens the dialog displayed in figure 12 14 The dialog offers to adjust the following parameters e Individual statistics layout If more sequences were selected in Step 1 this function generates separate statistics for each sequence e Co
216. hylogenetics is central to evolutionary biology as a whole as it is the condensation of the overall paradigm of how life arose and developed on earth CHAPTER 19 PHYLOGENETIC TREES 260 19 2 1 The phylogenetic tree The evolutionary hypothesis of a phylogeny can be graphically represented by a phylogenetic tree Figure 19 4 shows a proposed phylogeny for the great apes Hominidae taken in part from Purvis Purvis 1995 The tree consists of a number of nodes also termed vertices and branches also termed edges These nodes can represent either an individual a species or a higher grouping and are thus broadly termed taxonomical units In this case the terminal nodes also called leaves or tips of the tree represent extant species of Hominidae and are the operational taxonomical units OTUs The internal nodes which here represent extinct common ancestors of the great apes are termed hypothetical taxonomical units since they are not directly observable Terminal nodes leaves Operational Taxonomical Units Root node Branches edges Most recent common ancestor 2 Ora ngutan de Human a Chimpanzee Gorilla Internal Node vertice Hypothetical Taxonomical Unit Figure 19 4 A proposed phylogeny of the great apes Hominidae Different components of the tree are marked see text for description The ordering of the nodes determine the tree topology and describes how lineages have diverged over the cour
217. ich may only be known for a subset of the sequences can be transferred to aligned positions in other un annotated sequences e Conserved regions in the alignment can be found which are prime candidates for holding functionally important sites CHAPTER 18 SEQUENCE ALIGNMENT 254 e Comparative bioinformatical analysis can be performed to identify functionally important regions 20 40 60 80 1 i 1 Q6WN27 pe pe p N SPGUNSspdhvinAPKVREAGRKV g a a P68228 NP_058652 NP_032246 Q6H1U7 P68945 P68063 NP_032247 CAA32220 CAA24102 P04443 Q6WN28 Q6WN21 P67821 US tpdaviin CAA26204 MM mm Wigafsdglah P68873 MM Wigafsdatah Figure 18 13 The tabular format of a multiple alignment of 24 Hemoglobin protein sequences Sequence names appear at the beginning of each row and the residue position is indicated by the numbers at the top of the alignment columns The level of sequence conservation is shown on a color scale with blue residues being the least conserved and red residues being the most conserved 18 5 2 Constructing multiple alignments Whereas the optimal solution to the pairwise alignment problem can be found in reasonable time the problem of constructing a multiple alignment is much harder The first major challenge in the multiple alignment procedure is how to rank different alignments e which scoring function to use Since the sequences have a shared history they are correlated through their phylogeny and the scoring
218. idues only visible when you zoom in to see the residues e Wrap sequences Shows the sequence on more than one line No wrap The sequence is displayed on one line Auto wrap Wraps the sequence to fit the width of the view not matter if it is zoomed in our out displays minimum 10 nucleotides on each line Fixed wrap Makes it possible to specify when the sequence should be wrapped In the text field below you can choose the number of residues to display on each line e Double stranded Shows both strands of a sequence only applies to DNA sequences e Numbers on plus strand Whether to set the numbers relative to the positive or the negative strand in a nucleotide sequence only applies to DNA sequences e Numbers on sequences Shows residue positions along the sequence The starting point can be changed by setting the number in the field below If you set it to e g 101 the first residue will have the position of 100 This can also be done by right clicking an annotation and choosing Set Numbers Relative to This Annotation e Follow selection When viewing the same sequence in two separate views Follow selection will automatically scroll the view in order to follow a selection made in the other view e Lock numbers When you scroll vertically the position numbers remain visible Only possible when the sequence is not wrapped CHAPTER 11 VIEWING AND EDITING SEQUENCES 119 e Lock labels When you scro
219. ile format 86 seq file format 86 PDB file format 29 86 270 pdf format export 92 Peptide sequence databases 268 Personal information 19 Pfam domain search 265 Phred file format 29 86 270 phy file format 86 Phylip file format 29 86 270 Phylogenetic tree 256 265 tutorial 34 Phylogenetics Bioinformatics explained 259 pir file format 86 PIR NBRP file format 29 86 270 Plot dot plot 137 local complexity 147 png format export 92 Polarity colors 121 Positively charged residues 155 PostScript export 92 Preferences 76 advanced 78 export 78 General 77 import 78 style sheet 78 toolbar 77 View 77 view 68 Primer 205 analyze 204 based on alignments 200 design 265 display graphically 189 length 187 mode 189 nested PCR 189 order 207 sequencing 189 standard 189 TaqMan 189 tutorial 41 Print 82 dot plots 138 preview 83 visible area 82 whole view 82 pro file format 86 Problems when starting up 19 Processes 72 Project create new 26 Protein charge 171 265 hydrophobicity 177 Isoelectric point 153 report 265 statistics 152 translation 179 INDEX 281 Proteolytic cleavage 265 Proxy server 22 Proxy settings and license activation 15 ps format export 92 PubMed references search 105 PubMed references search 265 Quality of trace 210 Quick start 21 Rasmol colors 121 Reading frame 168 Realign alignment 265 Rebase res
220. ile holding the button release the mouse button Alternatively you can search for a specific interval using the search function described above You can select several parts of sequence by holding down the Ctrl button while making selections Holding down the Shift button lets you extend or reduce an existing selection to the position you clicked If you have made a selection you can expand it by using Shift and Ctrl keys or by using the right click menu right click the selection Expand Selection Select the number of residues to expand the selection to both sides To select the entire sequence right click the sequence label to the left To select a part of a sequence covered by an annotation right click the annotation Select annotation CHAPTER 11 VIEWING AND EDITING SEQUENCES 124 A selection can be opened in a new view and saved as a new sequence right click the selection Open selection in new view This opens the annotated part of the sequence in a new view The new sequence can be saved by dragging the tab of the sequence view into the Navigation Area The process described above is also the way to manually translate coding parts of sequences CDS into protein You simply translate the new sequence into protein This is done by right click the tab of the new sequence Toolbox Nucleotide Analyses A Translate to Protein A A selection can also be copied to the clipboard and pasted into another program
221. ing temperature difference between outer and inner refers to primer pair and probe respectively CHAPTER 15 PRIMERS 199 15 7 1 TaqMan output table In TaqMan mode there are two primers and a probe in a given solution forward primer F reverse primer R and a TaqMan probe TP The output table can show primer probe pair combination parameters for all three combinations of primers and single primer parameters for both primers and the TaqMan probe see section on Standard PCR for an explanation of the available primer pair and single primer information The fragment length in this mode refers to the length of the PCR fragment generated by the primer pair and this is also the PCR fragment which can be exported 15 8 Sequencing primers This mode is used to design primers for DNA sequencing In this mode the user can define a number of Forward primer regions and Reverse primer regions where a sequencing primer can start These are defined using the mouse right click menu If areas are known where primers must not bind e g repeat rich areas one or more No primers here regions can be defined No requirements are instated on the relative position of the regions defined After exploring the available primers See section 15 3 and setting the desired parameter values in the Primer Parameters preference group the calculate button will activate the primer design algorithm After pressing the calculate button a dialogue will appear
222. ing that element with two numerical values between the curly brackets The first number is a lower limit on the number of repetitions and the second number is an upper limit on the number of repetitions For example ACT 1 3 matches ACT ACT ACT and ACT ACT ACT X n represents a repetition of an element at least n times For example AC 2 matches all strings ACAC ACAC AC ACACACAC The symbol restricts the search to the beginning of your sequence For example if you search through a sequence with the regular expression AC the algorithm will find a match if AC occurs in the beginning of the sequence The symbol restricts the search to the end of your sequence For example if you search through a sequence with the regular expression GT the algorithm will find a match if GT occurs in the end of the sequence Examples The expression ACG AC G 2 matches all strings of length 4 where the first character is 4 C or G and the second is any character except 4 C and the third and fourth character is The expression G A matches all strings of length 3 in the end of your sequence where the first character is C the second any character and the third any character except A For proteins you can enter different protein patterns from the PROSITE database protein patterns using regular expressions and describing specific amino acid sequences The PROSITE database contains a great number of patterns and have
223. inside a View Area Furthermore it deals with rearranging the Views Section 3 3 deals with the zooming and selecting functions 3 2 1 Open View Opening a View can be done in a number of ways double click an element in the Navigation Area or select an element in the Navigation Area File Show Select the desired way to view the element or select an element in the Navigation Area Ctrl O 36 B on Mac Opening a View while another View is already open will show the new View in front of the other View The View that was already open can be brought to front by clicking its tab CHAPTER 3 USER INTERFACE 65 2 AY310318 50 100 a HBB g AY310318 v e AJ871593 a HUMDINUC Figure 3 6 A View Area can enclose several Views each View is indicated with a tab see top left View which shows protein P12675 Furthermore several Views can be shown at the same time in this example three views are displayed Notice If you right click an open tab of any element click Show and then choose a different view of the same element this new view is automatically opened in a split view allowing you to see both views See section 3 1 4 for instructions on how to open a View using drag and drop 3 2 2 Close Views When a View is closed the View Area remains open as long as there is at least one open View A View is closed by right click the tab of the View Close or select the View Ctrl W or hold
224. ion Move Starting Point to Selection Start Notice This can only be done for sequence that have been marked as circular Chapter 12 General sequence analyses Contents TATDOE PlotS oo a a aa ra ee OKOE ee Aia 136 12 4 1 Create dot plots e oce socas mos on d Re a a a OR OE Ca a 137 12 42 View dot Pots o s e ee k e we Boe bee ap ew E E a a 138 12 1 3 Bioinformatics explained Dot plots a a aoao aoao a a a a a a 139 12 1 4 Bioinformatics explained Scoring matrices 142 12 2 Shuffl SEQUENCE i c sa a aa A A A 146 12 3 Local complexity plot a aoao a 147 12 3 1 Local complexity view preferences 148 12 4 Sequence statistics a aoao noonoo 149 12 4 1 Sequence statistics output o o 152 12 4 2 Bioinformatics explained Protein statistics 152 12 5 Join sequences ee 156 12 6 Motif Search oo a 157 12 6 1 Motif search parameter Settings lt s esos rs ds e a 159 A 6 2 MOUPSEERCMWOURBUL uscar A ea ae ee ee 160 12 7 Pattern Discovery 2 2 6 ic we ee 160 12 7 1 Pattern discovery search parameters o 2000 161 12 7 2 Pattern search output a a a a a a a 162 CLC Gene Workbench 2 0 offers different kinds of sequence analyses which apply to both protein and DNA The analyses are described in this chapter 12 1 Dot plots Dot plots provide a powerful visual comparison of two sequences D
225. ion about the object like its history and comments The CLC format is also able to hold several objects of different types e g an alignment a graph and a phylogenetic tree This means that if you are exporting your data to another CLC Workbench you can use the CLC format to export several objects in one file and all the objects information is preserved Notice CLC files can be exported from and imported into all the different CLC Workbenches CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 90 Back up The CLC format is practical for making manual back up of your files All files are stored in Projects and these can easily be exported out of CLC Gene Workbench select the project to export Export ES choose where to export to enter name of project Save Other than that the files of the Navigation Area are stored in a persistence folder on your computer Hence your regular back up system should be set up to include this folder On Mac the folder can be found Library Application Support CLC bio Workbench lt version number gt persistence On Windows Documents and Settings lt username gt CLC bio Workbench lt version number gt persistence On Linux home lt username gt clcbio workbench lt version number gt persistence 6 2 External files In order to help you organize your projects CLC Gene Workbench 2 0 lets you import all kinds of files E g if you have Word Excel or pdf files related to your project yo
226. ion concentration 188 Inner 188 Primer concentration 188 Memory adjust maximum amount 22 Menu Bar illustration 58 mm CIF file format 29 86 270 Mode toolbar 69 Modify enzyme list 235 Molecular weight 152 Motif search 157 265 Mouse modes 69 Move content of a view 71 elements in Navigation Area 60 sequences in alignment 250 msf file format 86 Multiple alignments 253 265 Multiselecting 60 Navigation Area 58 create local BLAST database 114 illustration 58 NCBI 101 search sequence in 105 search tutorial 31 Negatively charged residues 154 Neighbor Joining algorithm 261 Neighborjoining 265 Nested PCR primers 265 Network configuration 22 New INDEX 280 feature request 19 folder 26 59 project 26 59 sequence 129 Newick file format 29 86 270 nexus file format 86 Nexus file format 29 86 270 Non standard residues 121 nr BLAST databases 109 Nucleotide info 121 sequence databases 268 Numbers on sequence 118 nwk file format 86 nxs file format 86 Old data import 87 Online check of demo license key 15 Open consensus sequence 246 files 20 Open reading frame determination 168 Open ended sequence 168 Order primers 207 265 ORF 168 Origins from 96 Page setup 83 PAM scoring matrices 142 Parameters search 102 Parsing automatic 87 Paste copy 93 Pattern Discovery 160 Pattern discovery 265 Pattern Search 157 PCR primers 265 pdb f
227. ion is on the strand complementary to the presented strand join complement 4918 5163 complement 2691 4571 Complements regions 4918 to 5163 and 2691 to 4571 then joins the complemented segments the region is on the strand complementary to the presented strand Click OK to add the annotation Notice The annotation will be included if you export the sequence in GenBank Swiss Prot or CLC format To modify an existing annotation right click the annotation Edit Annotation This will show the same dialog as in figure 11 2 with the exception that some of the fields are filled out depending on how much information the annotation contains 11 1 5 Removing annotations Annotations can be hidden using the Annotation Types preferences in the Side Panel to the right of the view see section 11 1 1 In order to completely remove the annotation right click the annotation Delete Annotation If you want to remove all annotations of one type right click an annotation of the type you want to remove Delete Annotations of This Type If you want to remove all annotations from a sequence right click an annotation Delete All Annotations The removal of annotations can be undone using Ctrl Z or Undo in the Toolbar 11 1 6 Sequence region types The various annotations on sequences cover parts of the sequence Some cover an interval some cover intervals with unknown endpoints some cover more than one interval etc In t
228. irst click the node one time and then click the node again and this time hold the mouse button In order to change the representation e Rearrange leaves and branches by Select a leaf or branch Move it up and down Hint The mouse turns into an arrow pointing up and down e Change the length of a branch by Select a leaf or branch Press Ctrl Move left and right Hint The mouse turns into an arrow pointing left and right Alter the preferences in Side Panel for changing the presentation of the tree Notice The preferences will not be saved Viewing a tree in different viewers gives you the opportunity to change into different preferences in all of the viewers For example if you select the Annotation Layout species for a node then you will only see the change in the specified view If you now move leaves the leaves in all views are moved The options of the right click pop up menu are changing the tree and therefore they change all views Notice The Set Root Above and the Set Root Here functions change the tree and therefore you may save it in order to be able to see it in this format later on 19 2 Bioinformatics explained phylogenetics Phylogenetics describes the taxonomical classification of organisms based on their evolutionary history i e their phylogeny Phylogenetics is therefore an integral part of the science of systematics that aims to establish the phylogeny of organisms based on their characteristics Furthermore p
229. ist you can specify which enzymes restriction sites should be displayed see figure 17 4 160 v Restriction sites l CACACACA CGACCACACTGCATCTGCAGAACCG Show GTGTGTGTCAGCTIGGTGTGACGTA CGTCTTGGC Done MA sti ceca E Y salt GTCGAC Figure 17 4 Showing restriction sites of two restriction enzymes The color of the flag of the restriction site can be changed by clicking the colored box next to the enzyme s name The list of restriction enzymes contains per default ten of the most popular enzymes but you can easily modify this list and add more enzymes You have four ways of modifying the list e Edit enzymes button This displays a dialog with the enzymes currently in the list shown at the bottom and a list of available enzymes at the top To add more enzymes select them in the upper list and press the Add enzymes button 44 To remove enzymes select them in the list below and click the Remove enzymes button e Load enzymes button If you have previously created an enzyme list you can select this list by clicking the Load enzymes button You can filter the enzymes in the same way as illustrated in figure 17 13 e Add enzymes cutting the selection to panel If you make a selection on the sequence right click you find this option for adding enzymes Based on the entire list of available enzymes the enzymes cutting in the region you selected will be added to the list in the Side Panel e Insert restriction site b
230. ividual Sequences select the part of the sequence you want to delete right click the selection Edit selection Delete the text in the dialog Replace The selection shown in the dialog will be replaced by the text you enter If you delete the text the selection will be replaced by an empty text i e deleted To delete entire columns select the part of the alignment you want to delete right click the selection Delete columns The selection may cover one or more sequences but the Delete columns function will always apply to the entire alignment 18 3 4 Copy annotations to other sequences Annotations on one sequence can be transferred to other sequences in the alignment right click the annotation Copy Annotation to other Sequences This will display a dialog listing all the sequences in the alignment Next to each sequence is a checkbox which is used for selecting which sequences the annotation should be copied to Click Copy to copy the annotation 18 3 5 Move sequences up and down Sequences can be moved up and down in the alignment drag the label of the sequence up or down When you move the mouse pointer over the label the pointer will turn into a vertical arrow indicating that the sequence can be moved The sequences can also be sorted automatically to let you save time moving the sequences around To sort the sequences alphabetically Right click the label of a sequence Sort Sequences Alphabetically If you c
231. ke sequence circular This will convert a Sequence from a linear to a circular form and has implications for e g the action of restriction enzymes The circular form is represented by lt lt at the beginning of the sequence Make sequence linear This will convert a sequence from a circular to a linear form Delete sequence This deletes the given sequence from the cloning editor Select sequence This will select the entire sequence Open sequence in new view This will open the selected sequence in a normal sequence view Sort sequence list by name This will sort all the sequences in the cloning editor alphabetically by name Sort sequences by length This will sort all the sequences in the cloning editor alphabetically by length Manipulate parts of the sequence Right clicking a selection reveals several options on manipulating the selection see figure 17 6 Delete selection This will delete the selected region of the sequence Duplicate selection If a selection on the sequence is duplicated the selected region will be added as a new sequence to the cloning editor with a new sequence label representing the length of the fragment Insert sequence after selection Insert a sequence after the selected region The sequence to be inserted can be selected from a list containing all sequences in the cloning editor Insert sequence before selection Insert a sequence before the selected region The sequence to be inserted can be
232. kspace Search Fit Width 100 Pan ESP Zoom In Menu Bar Tevigation Ares gt AY738615 Default project for CLC user Toolbar S L Example data 4s Bh Ad _ B E Nucleotide IHBD HBB EIA N av i g ati on Area fea Sequences Sequence layout s vec NM_000044 lt 81738015 AY738615 CCTTTAGTGATGGCCTGG O 20 HUMDINUC incur 96 PERH2BD i 06 PERH3BC Auto wrap View Area ES Cloning project AY738615 CTCACCTGGACAACCTCA t Primer design Double stranded E V Numbers on sequences tool Sa er KA Nucleotide Analyses AY738615 AGGGCACTTTTTCTCAGC eee 1 leg Protein Analyses Y Numbers on plus strand Su Primers and Probes a V Follow selection f Cloning and Restriction Sites i BLAST Search SS fi Database Search AY738615 TGAGTGAGCTGCACTGTG Y Lock labels Sequence label Processes Toolbox a E Idie Status Bar Figure 3 1 The user interface consists of the Menu Bar Toolbar Status Bar Navigation Area Toolbox and View Area 3 1 Navigation Area The Navigation Area is located in the left side of the workbench under the Toolbar It is used for organizing and navigating data The Navigation Area displays a Project Tree see figure 3 2 which is similar to the way files and folders are usually displayed on your computer The Project Tree contains one or more projects The elements which are available in the Navigation Area remain the same when changing Workspaces see section 3 5 A project can be a collection of elements which are relate
233. l 1 Select sequences Select equence Projects Selected Elements S L Example data XS PERHSBC 5 Nucleotide 2C PERH2BD e Sequences 206 HUMDINUC f sequence list E Assembly w Cloning project Primer design a Restriction analysis E Protein B E Extra i Performed analyses E README E CLC bio Home Figure 17 19 Select one or more sequences to separate on a gel If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree Clicking Next generates the dialog shown in figure 17 20 In this dialog you can choose from two different ways of simulating the gel electrophoresis e Run each sequence in a separate lane This will create a new lane for each of the selected sequences As a result there will only be one band on each lane e Run all sequences in same lane This will create only one lane in which each of the selected sequences will be represented by a band The difference between these two options is shown in figure17 21 Click Next if you wish to adjust how to handle the results See section 8 1 If not click Finish CHAPTER 17 CLONING AND CUTTING 237 Y Separate Sequences on Gel 1 Select sequences Set paremeters 2 Set parameters Specify lanes on the gel Run each sequence in a separate lane Run all sequences i
234. l operations you make in the program If e g you rename a sequence align sequences create a phylogenetic tree or translate a sequence you can always go back and check what you have done In this way you are able to document and reproduce previous operations This can be useful in several situations It can be used for documentation purposes where you can specify exactly how your data has been created and modified It can also be useful if you return to a project after some time and want to refresh your memory on how the data was created Also if you have performed an analysis and you want to reproduce the analysis on another element you can check the history of the analysis which will give you all parameters you set This chapter will describe how to use the History functionality of CLC Gene Workbench 2 0 7 1 Element history You can view the history of all elements in the Navigation Area except files that are opened in other programs e g Word and pdf files The history starts when the element appears for the first time in CLC Gene Workbench 2 0 To view the history of an element Right click the element in the Navigation Area Show History Cp or Select the element in the Navigation Area Show 42 in the Toolbar History Cb This opens a view that looks like the one in figure 7 1 When opening an element s history is opened the newest change is submitted in the top of the view The following information is available e
235. lat 27 APR 1993 110 PERH2BB M15290 P maniculat 27 APR 1993 110 PERH3BA M15291 P maniculat 27 APR 1993 110 TE Show column A Figure 4 5 The floating Side Panel can be moved out of the way e g to allow for a wider view of a table Chapter 5 Printing Contents 5 1 Selecting which part of the view to print lt lt lt lt lt 82 5 2 Page setup scc cai eee ee ee we ee ee 83 5 3 Print preview a os sopone a Gee eae eee we ee A ee ee ee 83 CLC Gene Workbench 2 0 offers different choices of printing the result of your work This chapter deals with printing directly from the workbench Another option for using the graphical output of your work is to export graphics see chapter 6 3 in a graphic format and then import it into a document or into a presentation All the kinds of data that you can view in the View Area can be printed For some of the views the layout will be slightly changed in order to be printer friendly It is not possible to print elements directly from the Navigation Area They must first be opened in a view in order to be printed select relevant view Print 4 in the toolbar If you are printing e g alignments Sequences and graphs you will be faced with three different dialogs allowing you to adjust the way your view is printed e A dialog to let you select which part of the view you want to print e A dialog to adjust p
236. latus deer mouse Equus caballus horse 100 Homo sapiens human Homo sapiens human Peromyscus maniculatus deer mouse soo Peromyscus maniculatus deer mouse se Peromyscus maniculatus deer mouse 8 Equus caballus horse Homo sapiens human 1 Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Homo sapiens human Homo sapiens human Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse Peromyscus maniculatus deer mouse so0f Homo sapiens human Homo sapiens human Figure 19 3 Method choices for phylogenetic inference The top shows a tree found by neighbor joining while the bottom shows a tree found by UPGMA The latter method assumes that the evolution occurs at a constant rate in different lineages 19 1 2 Tree View Preferences The Tree View preferences are these e Text format Changes the text format for all of the nodes the tree contains Text size The size of the text representing the nodes can be modified in tiny small medium large or huge Font Sets the font of the text of all nodes Bold Sets the text bold if enabled e Tree Layout Different layouts for the tree Node symbol Changes the symbol of nodes into box dot circle or none if you don t want a node symbol Layout Displays the tree layout as standard or topology Show internal node labels This allows y
237. letting the majority decide the nucleotide in the contig Unknown nucleotide N The contig will be assigned an N character in all positions with conflicts Ambiguity nucleotides R Y etc The contig will display an ambiguity nucleotide reflecting the different nucleotides found in the reads For an overview of ambiguity codes see Background information in the Help menu Note that conflicts will always be highlighted no matter which of the options you choose Furthermore each conflict will be marked as annotation on the contig sequence and will be present if the contig sequence is extracted for further analysis As a result the details of any experimental heterogeneity can be maintained and used when the result of single sequence analyzes is interpreted CHAPTER 16 ASSEMBLY 216 Click Next if you wish to adjust how to handle the results See section 8 1 If not click Finish 16 5 Assemble to an existing contig This section describes how to assemble sequences to an existing contig When you assemble to an existing contig the result of the assembly process is not a new contig where all sequencing reads are re assembled Instead the newly introduced sequencing reads are aligned and added to the existing contig Sequences that do not align properly to the contig are omitted This feature can be used for example to provide a steady work flow when a number of exons from the same gene are sequenced one at a time and asse
238. licated and shown in the cloning editor as the third sequence This is seen in figure 17 11 Now find and select the EcoRV restriction site in position 187 in the donor plasmid pBR322 CHAPTER 17 CLONING AND CUTTING 230 G Cloning example e y 2000 xt v Sequence details 1 pBR322 Y Show Sequence Details LLC TCATG TT cancha gt Sequence layout AABAGTAG AAGTTCTT Annotation layout gt Annotation types BG 24h 8G1 y Restriction sites V Show Done HUMHBB SESE EEE GAGATCACACAT CTCTAGTGTGTA o 5 ye aa ry o ata a _ r 4 44 A AGTTCCACACACTEGE TCAAGGTGTGTGAGCG HBG2 A su HU MH BB _34478 35069 _ _ A Sequence Details ACACTCGC GAGATCAC Sequence Details BamHI GGATCC BglII AGATCT EcoRI GAATTC EcoRW GATATC PstI CTGCAG Sall GTCGAC Smal CCCGGG Xbal TCTAGA v v v v Y HindIII AAGCTT v v v v v xhol CTCGAG Select all Deselect all Edit enzymes Save enzymes Load enzymes Figure 17 11 Three sequences are shown in the cloning view The plasmid a chromosome holding the gene of interest and the duplicated gene Right click with the mouse on the EcoRV site and click insert sequence at this EcoRV site Select the third and duplicated sequence holding the gene of interest and click ok Now the HBG2 gene is inserted in
239. ligning a selection is a very powerful tool for editing alignments in several situations e Removing changes If you change the alignment in a specific region by hand you may end up being unhappy with the result In this case you may of course undo your edits but another option is to select the region and realign it e Adjusting the number of gaps If you have a region in an alignment which has too many gaps in your opinion you can select the region and realign it By choosing a relatively high gap cost you will be able to reduce the number of gaps e Combine with fixpoints If you have an alignment where two residues are not aligned but you know that they should have been You can now set an alignment fixpoint on each of the two residues select the region and realign it using the fixpoints Now the two residues are aligned with each other and everything in the selected region around them is adjusted to accommodate this change 18 4 Join alignments CLC Gene Workbench can join several alignments into one This feature can for example be used to construct supergenes for phylogenetic inference by joining alignments of several disjoint genes into one spliced alignment Note that when alignments are joined all their annotations are carried over to the new spliced alignment Alignments can be joined by select alignments to join Toolbox in the Menu Bar Alignments and T rees fa Join Alignments EF or select alignments to join
240. listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree CHAPTER 15 PRIMERS 205 Y Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Probe parameters Minimum number of mismatches 1 Minimum number of mismatches in central part 18 Primer combination parameters Max percentage point difference in G C content mia Max difference in melting temperatures within a primer pair OD Max pair annealing score Ab Lal 1 1d Minimum difference in melting temperature Inner Outer Desired difference in melting temperature Inner Outer 1 wi Calculate X Cancel Q Help Figure 15 14 Calculation dialog shown when designing alignment based TaqMan probes On 4 Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish The result is shown in figure 15 15 Standard primers for sequence Primer Number of rows 1 Sequence Self annealing alignment GC content Secondary structure Te C Sel annealing T CCATGGTTTCCTTCCTCT CCATGGTTTCCTTCCTCT Hit 50 TCTCCTTCCTTIGOTACC Self annealing alignment O self e
241. ll In bioinformatics analysis of proteins it is sometimes useful to know the ancestral DNA sequence in order to find the genomic localization of the gene Thus the translation of proteins back to DNA RNA is of particular interest and is called reverse translation or back translation The Genetic Code In 1968 the Nobel Prize in Medicine was awarded to Robert W Holley Har Gobind Khorana and Marshall W Nirenberg for their interpretation of the Genetic Code hnttp nobelprize org medicine laureates 1968 The Genetic Code represents translations of all 64 different codons into 20 different amino acids Therefore it is no problem to translate a DNA RNA sequence into a specific protein But due to the degeneracy of the genetic code several codons may code for only one specific amino acid This can be seen in figure 14 10 After the discovery of the genetic code it has been concluded that different organism and organelles have genetic codes which are different from the standard genetic code Moreover the amino acid alphabet is no longer limited to 20 amino acids The 21 st amino acid selenocysteine is encoded by an UGA codon which is normally a stop codon The discrimination of a selenocysteine over a stop codon is carried out by the translation machinery Selenocysteines are very rare amino acids The figure 14 10 and 14 11 represents the Standard Code which is the default translation table AAS FFLLSSSSYY CCWWLLLLPPPPHHQQRRRRI IMMTTTTN
242. ll Fields in the third drop down menu write complete in the adjoining text field Now you have two choices Either to click Start search to commence the search in NCBI or to click Save search parameters to choose where to save the search CHAPTER 2 TUTORIALS 32 amp NCBI search Choose database Nucleotide O Protein All Fields human B Al Fields hemoglobin le All Fields v complete B Add search parameters 8 Start search Append wildcard to search words Accession Definition Modification D Bco10230 Homo sapiens chromosome 10 open reading frame 83 mRNA cDNA clo 2004 03 25 A BCO1S537 Homo sapiens hemoglobin epsilon 1 mRNA cDNA clone MGC 9582 IM 2004 06 29 BCO32122 Homo sapiens hemoglobin alpha 2 MRNA cDNA clone MGC 29691 IMA 2003 12 19 BC032264 Mus musculus hemoglobin beta adult minor chain mRNA cDNA clone M 2006 04 13 BC043020 Mus musculus hemoglobin alpha adult chain 1 mRNA cDNA clone MGC 2004 06 30 BCos0661 Homo sapiens hemoglobin alpha 2 mRNA cDNA clone MGC 60177 IMA 2003 10 07 BCO51988 Mus musculus hemoglobin X alpha like embryonic chain in Hba complex 2004 06 30 BC052008 Mus musculus hemoglobin Z beta like embryonic chain mRNA cDNA cl 2006 04 27 BCOS6686 Homo sapiens hemoglobin theta 1 mRNA cDNA clone MGC 61857 IMA 2004 06 30 BCOS7014 Mus musculus hemoglobin Y beta
243. ll horizontally the label of the sequence remains visible e Sequence label Defines the label to the left of the sequence Name this is the default information to be shown Accession Sequences downloaded from databases like GenBank have an accession number Species Species accession Common Species Common Species accession Annotation Layout Annotations are data attached to a specific part of a sequence If the sequence is downloaded from a database it has annotations attached to it e g the location of genes on a DNA sequence If you have performed Restriction Site or Proteolytic Cleavage analysis the cut sites can be displayed as annotations on the sequence Other analyses also attach annotations on the sequence See section 11 1 6 for more information about how to interpret the annotations The annotations are shown as colored boxes along the sequence and their appearance is determined in the Annotation layout preferences group e Show annotations Determines whether the annotations are shown e Position On sequence The annotations are placed on the sequence The residues are visible through the annotations if you have zoomed in to 100 Next to sequence The annotations are placed above the sequence e Offset If several annotations cover the same part of a sequence they can be spread out Piled The annotations are piled on top of each other Only the one at front is visible Li
244. ltiple alignments are at the core of bioinformatical analysis Often the first step in a chain of bioinformatical analyses is to construct a multiple alignment of a number of homologs DNA or protein sequences However despite their frequent use the development of multiple alignment algorithms remains one of the algorithmically most challenging areas in bioinformatical research Constructing a multiple alignment corresponds to developing a hypothesis of how a number of sequences have evolved through the processes of character substitution insertion and deletion The input to multiple alignment algorithms is a number of homologous sequences i e sequences that share a common ancestor and most often also share molecular function The generated alignment is a table see figure 18 13 where each row corresponds to an input sequence and each column corresponds to a position in the alignment An individual column in this table represents residues that have all diverged from a common ancestral residue Gaps in the table commonly represented by a represent positions where residues have been inserted or deleted and thus do not have ancestral counterparts in all sequences 18 5 1 Use of multiple alignments Once a multiple alignment is constructed it can form the basis for a number of analyses e The phylogenetic relationship of the sequences can be investigated by tree building methods based on the alignment e Annotation of functional domains wh
245. lue The property has a specific syntax similar to Xmx512m It is very important that you only change the value of the number 512 in the example above to the amount of megabytes you want For the best performance you should not choose a number greater than the amount of physical memory available on your system 1 8 3 Linux e Locate the directory where you installed CLC Gene Workbench 2 0 and open it e Create a new empty text file called clcwb vmoptions e Add a single line to the file with a syntax similar to Xmx512m It is very important that the line looks exactly like the one in the example above and that you only change the value of the number 512 in the example For the best performance you should not choose a number greater than the amount in megabytes of physical memory available on your system CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 24 1 9 The format of the user manual This user manual offers support to Windows Mac OS X and Linux users The software is very similar on these operating systems In areas where differences exist these will be described separately However the term right click is used throughout the manual but some Mac users may have to use Ctrl click in order to perform a right click if they have a single button mouse The most recent version of the user manuals can be downloaded from http www clcbio com usermanuals The user manual consists of four parts e The first pa
246. m self annealing value of all primers and probes This determines the amount of base pairing allowed between two copies of the same molecule The self annealing score is measured in number of hydrogen bonds between two copies of primer molecules with A T base pairs contributing 2 hydrogen bonds and G C base pairs contributing 3 hydrogen bonds Self end annealing Determines the maximum self end annealing value of all primers and probes This determines the amount of consecutive base pairs allowed between the ends of two copies of the same molecule This score is also calculated in units of hydrogen bonds between two primer copies of identical primer molecules Secondary structure Determines the maximum score of the optimal secondary DNA structure found for a primer or probe Secondary structures are scored by the number of hydrogen bonds in the structure and 2 extra hydrogen bonds are added for each stacking base pair in the structure 3 end G C restrictions When this checkbox is selected it is possible to specify restrictions concerning the number of G and C molecules in the 3 end of primers and probes A low G C content of the primer probe 3 end increases the specificity of the reaction A high G C content facilitates a tight binding of the oligo to the template but also increases the possibility of mispriming Unfolding the preference groups yields the following options End length The number of consecutive terminal nucleotides f
247. m when single base mismatches occur When in Standard PCR mode pushing the Calculate button will prompt the dialog shown in figure 15 13 The top part of this dialog shows the single primer parameter settings chosen in the Primer parameters preference group which will be used by the design algorithm The central part of the dialogue contains parameters pertaining to primer specificity this is omitted if all sequences belong to the included group Here three parameters can be set e Minimum number of mismatches the minimum number of mismatches that a primer must have against all sequences in the excluded group to ensure that it does not prime these e Minimum number of mismatches in 3 end the minimum number of mismatches that a primer must have in its 3 end against all sequences in the excluded group to ensure that it does not prime these e Length of 3 end the number of consecutive nucleotides to consider for mismatches in the 3 end of the primer The lower part of the dialog contains parameters pertaining to primer pairs this is omitted when only designing a single primer Here three parameters can be set e Maximum percentage point difference in G C content if this is set at e g 5 points a pair of primers with 45 and 49 G C nucleotides respectively will be allowed whereas a pair of primers with 45 and 51 G C nucleotides respectively will not be included e Maximal difference in melting temperature of
248. make a selection Ctrl C 36 C on Mac Notice The annotations covering the selection will not be copied A selection of a sequence can be edited as described in the following section 11 1 3 Editing the sequence When you make a selection it can be edited by right click the selection Edit selection A dialog appears displaying the sequence You can add remove or change the text and click OK The original selected part of the sequence is now replaced by the sequence entered in the dialog This dialog also allows you to paste text into the sequence using Ctrl V V on Mac If you delete the text in the dialog and press OK the selected text on the sequence will also be deleted Another way to delete a part of the sequence is to right click the selection Delete selection Another way to edit the sequence is by inserting a restriction site See section 17 2 2 11 1 4 Adding and modifying annotations Most sequences carry different biological information When retrieving sequences from various databases the sequence often contains biological information by way of annotations You can manually add annotations from a compiled annotation list This list of annotations covers the most frequently used annotations in UniProt and GenBank Annotations which have been added to a sequence can be removed at any time see section 11 1 5 Annotations can be added to a sequence make a selection covering the part of the sequence you want t
249. man Doolittle Woods GES A Alanine 1 80 0 50 0 20 0 62 0 74 0 30 1 60 C Cysteine 2 50 1 00 4 10 0 29 0 91 0 90 2 00 D Aspartic acid 3 50 3 00 3 10 0 90 0 62 0 60 9 20 E Glutamic acid 3 50 3 00 1 80 0 74 0 62 0 70 8 20 F Phenylalanine 2 80 2 50 4 40 1 19 0 88 0 50 3 70 G Glycine 0 40 0 00 0 00 0 48 0 72 0 30 1 00 H Histidine 3 20 0 50 0 50 0 40 0 78 0 10 3 00 Isoleucine 4 50 1 80 4 80 1 38 0 88 0 70 3 10 K Lysine 3 90 3 00 3 10 1 50 0 52 1 80 8 80 L Leucine 3 80 1 80 5 70 1 06 0 85 0 50 2 80 M Methionine 1 90 1 30 4 20 0 64 0 85 0 40 3 40 N Asparagine 3 50 0 20 0 50 0 78 0 63 0 50 4 80 P Proline 1 60 0 00 2 20 0 12 0 64 0 30 0 20 Q Glutamine 3 50 0 20 2 80 0 85 0 62 0 70 4 10 R Arginine 4 50 3 00 1 40 2 53 0 64 1 40 12 3 S Serine 0 80 0 30 0 50 0 18 0 66 0 10 0 60 T Threonine 0 70 0 40 1 90 0 05 0 70 0 20 1 20 V Valine 4 20 1 50 4 70 1 08 0 86 0 60 2 60 WwW Tryptophan 0 90 3 40 1 00 0 81 0 85 0 30 1 90 Y Tyrosine 1 30 2 30 3 20 0 26 0 76 0 40 0 70 Table 14 1 Hydrophobicity scales This table shows seven different hydrophobicity scales which are generally used for prediction of e g transmembrane regions and antigenicity 14 3 Reverse translation from protein into DNA A protein sequence can be back translated into DNA using CLC Gene Workbench 2 0 Due to degeneracy of the genetic code every amino acid could translate into several different codons only 20 amino acids but 64 diff
250. matter which of the options you choose Furthermore each conflict will be marked as annotation on the contig sequence and will be present if the contig sequence is extracted for further analysis As a result the details of any experimental heterogeneity can be maintained and used when the result of single sequence analyzes is interpreted e Show view of both contigs and reads This will display a proper contig data object where all the aligned reads are displayed below the contig sequence You can always extract the contig sequence without the reads later on e Show only contig sequences This will not display a contig data object but will only output the assembled contig sequences as single nucleotide sequences Choosing this option there is thus no opportunity to validate and edit the assembly process If you have chosen to Trim sequences click Next and you will be able to set trim parameters see section 16 2 2 Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish When the assembly process has ended a number of views will be shown each containing a contig of two or more sequences that have been matched If the number of contigs seem too high or low try again with another Alignment stringency setting Depending on your choices of output options above the views will include trace files or only contig sequences However the calculation of the contig is carried out the same way no matter h
251. mbled to a reference sequence To start the assembly select one contig and a number of sequences Toolbox in the Menu Bar Assembly 3 Assemble Sequences to Contig This opens a dialog where you can alter your choice of Sequences which you want to assemble You can also add sequence lists There has to be one contig among the selected elements When the elements are selected click Next and you will see the dialog shown in figure 16 7 9 Assemble Sequences to Contig 1 Select some nucleotide MEA sequences and one contig 2 Set parameters Alignment options Minimum aligned read length 50 Alignment stringency Medium v Trimming options Use existing trim information Generally not necessary since a reference sequence is used felis Previous pee Semen Kene Figure 16 7 Setting assembly parameters when assembling to an existing contig The options in this dialog are similar to the options that are available when assembling to a reference sequence see section 16 4 Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish This will start the assembly process 16 6 View and edit contigs The result of the assembly process is one or more contigs where the sequence reads have been aligned see figure 16 8 You can see that color of the residues and trace at the end of one of the reads has been faded This indicates that this region has not c
252. ment in CLC format clc will export the history too In this way you can share projects and files with others while preserving the history If an element s history includes source elements i e if there are elements listed in Origins from they must also be exported in order to see the full history Otherwise the history will have entries named Element deleted An easy way to export an element with all its source elements is to use the Export Dependent Objects function described in section 6 1 2 The of a history view can be printed To do so click the Print icon 44 Chapter 8 Handling of results Contents 8 1 How to handle results of analyses 2 97 8 1 1 When the analysis does not create new elements 97 SL Baila aa a a ae ee a A a do 98 Most of the analyses in the Toolbox are able to perform the same analysis on several elements in one batch This means that analyzing large amounts of data is very easily accomplished If you e g wish to translate a large number of DNA sequence to protein you can just select the DNA sequences and set the parameters for the translation once Each DNA sequence will then be treated individually as if you performed the translation on each of them The process will run in the background and you will be able to work on other projects at the same time 8 1 How to handle results of analyses All the analyses in the Toolbox are performed in a step
253. mer to search for A primer is a normal DNA sequence but can only have a maximum length of 50 nucleotides After clicking finish the sequences where the primer binds to a subsequence will be annotated with a Primer Binding Site containing information about the primer binding to this subsequence An example of the result is shown in figure 15 17 CHAPTER 15 PRIMERS 207 50 100 I Primer gt Site AJ871593 Figure 15 17 Annotation showing a primer match 15 12 Order primers To facilitate the ordering of primers and probes CLC Gene Workbench offers an easy way of displaying and saving a textual representation of one or more primers select primers in Navigation Area Toolbox in the Menu Bar Primers and Probes E2 Order Primers F This opens a dialog where you can choose additional primers Clicking OK opens a textual representation of the primers see 15 18 The first line states the number of primers being ordered and after this follows the names and nucleotide sequences of the primers in 5 3 orientation From the editor the primer information can be copied and pasted to web forms or e mails The created object can also be saved to a project and exported as a text file See figure 15 18 E Primer order amp Number of primers 4 Name Primer Fl 24 44 GTTTCCTTCCTCTAGTTTCT Name Primer Rl 123 141 CTCTTGTCAGCACTCCAT Name Primer Rl 128 146 CCAAACTCTTGTCAGCAC Name Primer Fl 19 37
254. mes which fulfill match number criteria Previous pret Figure 2 14 Selecting enzymes PERHSAC PERH3BC GTGAGTCTGA TGGGTCTGCC CATGGTTTCC TTCCTCTAGT TTCTG 0 Mboll l PERH3BC GGCTTACCTT CCTATCAGAA GGAAATGGGA AGAGATTCTA GGGAG ie Tth ln PERH3BC CAGTTTAGAT GGAAGGTATC TGCTTGTTCC CCCATGGAGT GCTGA Ci 140 PERH3BC CAAGAGTTTG GTTATTTTAC TCTCCACTCA CAATCATCAT GTCCT ES PERHSBC restr Ex Name Pattern Overhang Number of matches Cut position s CjePI ccannnnnnnte 3 1 la 51 184 MbolI gaaga 3 86 Tth11111 caarca 3 1 101 Figure 2 15 The result of the restriction site detection is displayed as text and in this tutorial the View shares the View Area with a View of the PERH3BC sequence displaying the restriction sites split screen view To save the result Right click the tab File Save H 2 7 Tutorial Sequence information This tutorial shows you how to see background information about a sequence including an overview of its annotations Suppose you are working with the HUMHBB sequence from the example data The Example data can be installed in the program by clicking Install Example Data from the Help menu in the Menu Bar The Example data can also be downloaded from http www clcbio com download and you wish to see more background information about this sequence This can be done using the Sequence Info functionality of CLC Gene Workbench CHAPTE
255. minated processes 72 Text format 123 user manual 24 view sequence 128 Text file format 29 86 270 tifformat export 92 Tips and tricks tutorial 47 Toolbar illustration 58 preferences 77 Toolbox 72 illustration 58 show hide 72 Topology layout trees 259 Trace data 208 265 quality 210 Translate a selection 121 along DNA sequence 121 annotation to protein 124 CDS 167 coding regions 167 DNA to RNA 163 nucleotide sequence 166 ORF 167 protein 179 RNA to DNA 164 to DNA 265 to protein 166 265 Translation of a selection 121 show together with DNA sequence 121 tables 166 Transmembrane helix prediction 265 Trim 209 265 txt file format 86 Undo limit 77 INDEX 283 Undo Redo 66 UniProt search 265 search sequence in 106 UniVec trimming 210 UPGMA algorithm 261 265 Upgrade license 18 Urls Navigation Area 90 User defined view settings 78 User interface 58 Variance table assembly 219 Vector see cloning 221 Vector contamination find automatically 210 Vector design 221 Vector graphics export 92 VectorNTI file format 29 86 270 import data from 87 View 64 alignment 246 dot plots 138 preferences 68 save changes 66 sequence 117 sequence as text 128 View Area 64 illustration 58 View preferences 77 show automatically 77 style sheet 78 View settings user defined 78 Wildcard append to search 102 Windows installation 12 Workspace 73
256. move the sequence in the View 3 3 6 Selection The Selection mode Q is used for selecting in a View selecting a part of a sequence selecting nodes in a tree etc It is also used for moving e g branches in a tree or sequences in an alignment When you make a selection on a sequence or in an alignment the location is shown in the bottom right corner of your workbench E g 23 24 means that the selection is between two residues 23 means that the residue at position 23 is selected and finally 23 25 means that 23 24 and 25 are selected By holding ctrl you can make multiple selections CHAPTER 3 USER INTERFACE 72 3 4 Toolbox and Status Bar The Toolbox is placed in the left side of the user interface of CLC Gene Workbench 2 0 below the Navigation Area The Toolbox shows a Processes tab and a Toolbox tab 3 4 1 Processes By clicking the Processes tab the Toolbox displays previous and running processes e g an NCBI search or a calculation of an alignment The running processes can be stopped paused and resumed Active buttons are blue If a process is terminated the stop pause and play buttons of the process in question are made gray The terminated processes can be removed by View Remove Terminated Processes Running and paused processes are not deleted Aligning sequences CLL si gt Download process it i gt DB nucleotide human HARAN 100 E I gt Ali
257. mparative statistics layout If more sequences were selected in Step 1 this function generates statistics with comparisons between the sequences You can also choose to include Background distribution of amino acids If this box is ticked an extra column with amino acid distribution of the chosen species is included in the table output The distributions are calculated from UniProt www uniprot org version 6 0 dated September 13 2005 Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish An example of protein sequence statistics is shown in figure 12 15 Nucleotide sequence statistics are generated using the same dialog as used for protein sequence statistics However the output of Nucleotide sequence statistics is less extensive than that of the protein sequence statistics Notice The headings of the tables change depending on whether you calculate individual or comparative sequence statistics The output of comparative protein sequence statistics include e Sequence information Sequence type Length Organism Locus Description Modification Date CHAPTER 12 GENERAL SEQUENCE ANALYSES 151 k CAA25204 S Table Of Contents 1 Protein statistics 1 1 Sequence information 1 2 Counts of amino acids 1 3 Frequencies of amino acids 1 Protein statistics 1 1 Sequence information psn pt ano Nes mace a nT AA26204 E dodi catan Date
258. n e Open selection in new view This will open the selected region in the normal sequence view e Edit selection This will open a dialog box in which is it possible to edit the selected residues e Add annotation This will open the Add annotation dialog box e Trim sequences left Adds trim annotation from the beginning of the sequence to the point of selection Trimmed regions are not included when sequences are assembled into contigs e Trim sequences right Adds trim annotation from the point of selection to the end of the sequence Trimmed regions are not included when sequences are assembled into contigs CHAPTER 17 CLONING AND CUTTING 228 Manipulate using restriction sites Right click on a restriction site see section 17 2 2 gives you the following options see figure 17 7 Restriction site in the list below indicates a name on a selection restriction site This could for example be EcoRV e Insert sequence at this Restriction enzyme site This will insert a Sequence from a list into this particular site e Cut this sequence at this Restriction enzyme site This will cut the sequence at this particular site and only this site Cut this sequence at all Restriction enzyme sites This will cut the sequence at all identical restriction sites but at no other sites Cut all sequences at all Restriction enzyme sites This will cut all sequences in the cloning editor view with that particular restri
259. n 2 Y Numbers on plus strand a Primers and Probes a Assembly fag Cloning and Restriction Sites E inne i BLAST Search ce g Database Search AY738615 TGAGTGAGCTGCACTGTG Lock labels Sequence label BG HE Y Follow selection Processes Toolbox ani E Idle Figure 2 3 DNA sequence AY738615 opened in a view The view preferences has been hidden to provide more space for the view In the following we will show how the same sequence can be displayed in two different views double click sequence AY738615 in the Navigation Area This opens an additional tab Drag this tab to the bottom of the view See figure 2 4 AY738615 Figure 2 4 Dragging the tab down to the bottom of the view will display a gray area indicating that the tab can be dropped here and split the view The result is two views of the same sequence in the View Area as can be seen in figure 2 5 If you want to display a part of the sequence it is possible to select it and open it in another view click Selection in Toolbar select a part of the sequence right click the selected part of the sequence in the top view Open Selection in New View This opens a third display of sequence AY738615 However only the part which was selected In order to make room for displaying the selection of the sequence the most recent view drag CHAPTER 2 TUTORIALS 31 gt AY738615 AY738615CGTGGA
260. n Reading Frames X lt or right click a nucleotide sequence Toolbox Nucleotide Analyses A Find Open Reading Frames X lt This opens the dialog displayed in figure 13 7 9 Find Open Reading Frames 1 Select nucleotide Mi 5 sequences Projects Selected Elements S L Example data 7C HUMHBB B E Nucleotide Sequences fj Assembly Cloning project DOC pBR322 we lt j Primer design H Restriction analysis E Protein Performed analyses E README E CLC bio Home Figure 13 7 Create Reading Frame dialog If a Sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree If you want to adjust the parameters for finding open reading frames click Next 13 6 1 Open reading frame parameters This opens the dialog displayed in figure 13 8 The adjustable parameters for the search are CHAPTER 13 NUCLEOTIDE ANALYSES 169 9 Find Open Reading Frames 1 Select nucleotide En parameters sequences 2 Set parameters Start Codon O AuG O Any O All start codons in genetic code Other AUG CUG UUG V Both Strands V Stop codon included in translatation Open Ended Sequence Genetic code translation table 1 Standard Minimum Length 100 0 4 Previous pre J Y Finish YK Cancel Figure 13 8 Create Rea
261. n a specific TaqMan probe and all sequences which belong to the group not recognized by the probe CHAPTER 15 PRIMERS 204 e Minimum number of mismatches in central part the minimum number of mismatches in the central part of the oligo that must exist between a specific TaqMan probe and all sequences which belong to the group not recognized by the probe The lower part of the dialogue contains parameters pertaining to primer pairs and the comparison between the outer oligos primers and the inner oligos TaqMan probes Here five options can be set e Maximum percentage point difference in G C content described above under Standard PCR e Maximal difference in melting temperature of primers in a pair the number of degrees Celsius that primers in the primer pair are all allowed to differ e Maximum pair annealing score the maximum number of hydrogen bonds allowed between the forward and the reverse primer in an oligo pair This criteria is applied to all possible combinations of primers and probes e Minimum difference in the melting temperature of primer outer and TaqMan probe inner oligos all comparisons between the melting temperature of primers and probes must be at least this different otherwise the solution set is excluded e Desired temperature difference in melting temperature between outer primers and inner TaqMan oligos the scoring function discounts solution sets which deviate greatly from this value Reg
262. n deleted can be restored CHAPTER 3 USER INTERFACE 63 Y Restore Deleted Elements 1 Select Elements to Sc Restore Elements Deleted time HUMDINUC Sun Jul 02 16 36 54 CEST 2006 Figure 3 3 The Restore Deleted Elements dialog Y Restore Deleted Elements 1 Select Elements to Choose Restore Positio Restore Default project for CLC user 2 Choose Restore Position LL Example data ea Nucleotide sequences 20 NM_000044 DOC AY738615 20 PERH2BD 90 PERH3BC sequence list H Assembly a Cloning project 5 Primer design iH Restriction analysis H E Protein fg Extra Performed analyses E README Figure 3 4 The Restore Deleted Elements dialog The deleted elements remain in the Recycle Bin until the Recycle Bin is emptied To empty the bin Edit in the Menu Bar Empty recycle bin 3 1 7 Show folder elements in View A project or a folder might contain large amounts of elements It is possible to view the elements of a folder or project in the View Area select a project Show 4 in the Toolbar Folder Contents 7 When the elements are shown in the View they can be sorted by clicking the heading of each of the columns You can further refine the sorting by pressing Ctrl while clicking the heading of another column Sorting the elements in a View does not affect the ordering of the elements in the Navigation Area Notice The View only displays one layer of th
263. n phylogenetic inference using DNA sequences a Markov Chain Monte Carlo Method Mol Biol Evol 14 7 717 724 Part V Index 275 Index AB1 file format 29 86 270 ABI file format 29 86 270 About CLC Workbenches 18 Accession number display 61 Activate license commercial 17 demo 16 Add annotations 124 265 enzymes cutting selection 120 133 224 sequences to alignment 251 sequences to contig 216 Advanced preferences 78 Algorithm alignment 240 neighbor joining 261 UPGMA 261 Align alignments 243 protein sequences tutorial 32 sequences 265 Alignment Primers Degenerate primers 201 202 PCR primers 200 Primers with mismatches 201 202 Primers with perfect match 201 202 TaqMan Probes 200 Alignment based primer design 200 Alignments 240 265 add sequences to 251 create 241 design primers for 200 edit 249 fast algorithm 242 join 251 multiple Bioinformatics explained 253 remove sequences from 250 view 246 Aliphatic index 153 aln file format 86 Ambiguities reverse translation 182 Amino acid composition 155 Analyze primer properties 204 Annotate with SNP s 265 Annotation add 124 copy to other sequences 250 edit 124 in alignments 250 layout 119 map 128 overview 128 trim 210 types 120 Antigenicity 265 Append wildcard search 102 Arrange layout of sequence 29 views in View Area 67 Assemble sequences 212 to existing contig 216 to refer
264. n same lane jLA _ Previous ext Finish XK Cancel Figure 17 20 Choosing how to display the lanes For more information about the view of the gel see section 17 5 3 17 5 2 Separate fragments of sequences using restriction enzymes This section explains how to simulate a gel electrophoresis of one or more sequences which are digested with restriction enzymes There are two ways to do this e When performing the Restriction Sites analysis from the Toolbox you can choose to separate the resulting fragments on a gel This is explained in section 17 3 1 e From all the graphical views of Sequences you can right click the label of the sequence and choose Digest Sequence with Selected Enzymes and Run on Gel The views where this option is available are listed below Circular view see section 11 6 Ordinary sequence view see section 11 1 Graphical view of sequence lists see section 11 5 Cloning editor see section 17 2 Primer designer see section 15 3 Furthermore you can also right click an empty part of the view of the graphical view of sequence lists and the cloning editor and choose Digest All Sequences with Selected Enzymes and Run on Gel This opens a dialog with functionalities similar to the one in figure 17 15 Notice When using the right click options the sequence will be digested with the enzymes that are selected in the Side Panel This is explained in section 11 1 1
265. nce is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree If you want to adjust the parameters for primer matching click Next 15 11 1 Search for primer binding sites parameters This opens the dialog displayed in figure 15 16 9 Match Primer with Sequence 1 Select nucleotide Bearer sequences to match primer against 2 Set parameters Define match criteria Exact match Minimum number of base pairs required for a match 3 gt Number of consecutive base pairs requiredin3 end 5 gt Select sequence Select primer to match against sequence s primer lenath lt 50 oc Primer F1 19 37 i 0 4 _ Previous J next Y Finish YK Cancel Figure 15 16 Search parameters for matching primers to the sequence The adjustable parameters for the search are e Exact match Choose only to consider exact matches of the primer i e all positions must base pair with the template e Minimum number of base pairs required for a match How many nucleotides of the primer that must base pair to the sequence in order to cause priming mispriming e Number of consecutive base pairs required in 3 end How many consecutive 3 end base pairs in the primer that MUST be present to for priming mispriming to occur This option is included since 3 terminal base pairs are known to be essential for priming to occur e Select pri
266. nces f you have not previously trimmed the sequences this can be done by checking this box If selected the next step in the dialog will allow you to specify settings for trimming e Minimum aligned read length The minimum number of nucleotides in a read which must be successfully aligned to the contig If this criteria is not met by a read this is excluded from the assembly CHAPTER 16 ASSEMBLY 213 e Alignment stringency Specifies the stringency of the scoring function used by the alignment step in the contig assembly algorithm A higher stringency level will tend to produce contigs with less ambiguities but will also tend to omit more sequencing reads and to generate more and shorter contigs Three stringency levels can be set Low Medium High e Conflicts If there is a conflict between the reads in a given nucleotide position the program offers three ways to solve this Vote A C G T The conflict will be solved by counting instances of each nucleotide and then letting the majority decide the nucleotide in the contig Unknown nucleotide N The contig will be assigned an N character in all positions with conflicts Ambiguity nucleotides R Y etc The contig will display an ambiguity nucleotide reflecting the different nucleotides found in the reads For an overview of ambiguity codes see Background information in the Help menu Note that conflicts will always be highlighted no
267. ncluded in this region Trim contamination from vectors If selected the program will match the sequence reads against all vectors in the UniVec database and remove sequence ends with significant matches UniVec is available at ftp ftp ncbi nih gov pub UniVec Trim contamination from other sequences This option lets you select a specific vector sequence that you know might be the cause of contamination Click the Browse and select object icon jcy in order to select a sequence Hit limit Specifies how strictly vector contamination is trimmed Since vector contamination usually occurs at the beginning or end of a sequence different criteria are applied for terminal and internal matches A match is considered terminal if it is located within the first 25 bases at either sequence end Three match categories are defined according to the expected frequency of an alignment with the same score occurring between random sequences as calculated by NCBI VecScreen Weak Expect 1 random match in 40 queries of length 350 kb x Terminal match with Score 16 to 18 x Internal match with Score 23 to 24 Moderate Expect 1 random match in 1 000 queries of length 350 kb CHAPTER 16 ASSEMBLY 212 x Terminal match with Score 19 to 23 x Internal match with Score 25 to 29 Strong Expect 1 random match in 1 000 000 queries of length 350 kb x Terminal match with Score gt 24 x Internal match with Score gt 30 Click Next if you wis
268. ncy 2 10 3 Documenting your changes Whenever you make a change like replacing a T for a t it will be noted in the contig s history To open the history Right click the tab of the contig Show History Lil In the history you can see the details of each change see figure 2 35 When you have finished editing the contig the consensus can be saved Right click the label Contig Open Sequence in New View Save 5 CHAPTER 2 TUTORIALS 47 9 Trim Sequences 1 Select nucleotide A sequences 2 Set parameters Sequence trimming Ignore existing trim information V Trim using quality scores Limit 0 05 2 Trim using ambiguous nucleotides Residues 2 ES Vector trimming Trim contamination from vectors Hit limit moderate v Le JLA J Contig 1000 I Contig Trace of 1041063818107 scf Trace of 1041063818126 scf Trace of 1041063818147 scf Trace of 1041063818160 scf Trace of 1041063818173 scf E Trim contamination from other sequences rase Sr Figure 2 31 Specifying how sequences should be trimmed Me e a v Alignment info Consensus Conservation Sequence Logo w Coverage C Foreground color C Background color Graph Height low v Line plot v Color different residues Nucleotide info Text Format cTcctTGaA v Assembly Layout Gath
269. nd anneaiing GC content Secondary structure score Secondary structure Figure 15 15 Properties of a primer from the Example Data In the Side Panel you can specify the information to display about the primer The information parameters of the primer properties table are explained in section 15 5 2 The melting tempera ture of a primer is not available since this requires knowledge of the template to which the primer binds 15 11 Match primer with sequence In CLC Gene Workbench you have the possibility of matching a known primer against one or more DNA sequences or a list of DNA sequences This can be applied to test whether a primer used in a previous experiment is applicable to amplify e g a homologous region in another species or to test for potential mispriming When applied the algorithm will search for competing binding sites of the primer within the sequence You have the option of choosing the minimum number of matching nucleotides and a minimum number of nucleotides that must bind in the end of the primer These parameters will be explained in this section CHAPTER 15 PRIMERS 206 To search for primer binding sites select a nucleotide sequence Toolbox in the Menu Bar Primers and Probes E2 Match Primer with Sequence or right click a nucleotide sequence Toolbox Primers and Probes 2 Match Primer with Sequence 72 If a sequence was selected before choosing the Toolbox action this seque
270. ne major ticks e Show as histogram For some data series it is possible to see it as a histogram rather than a line plot Preferences for each protein Underneath the Graph preferences you will find is a set of preferences for each protein in the graph These preferences only apply to the curve for the specific protein e Dot type none cross plus square diamond circle triangle reverse triangle dot e Dot color Allows you to choose between many different colors e Line width thin medium CHAPTER 14 PROTEIN ANALYSES 174 wide e Line type none line long dash short dash e Line color Allows you to choose between many different colors These settings will apply to both the curve and the legend Modifying labels and legends Click the title of the graph the axis titles or the legend to edit the text 14 2 Hydrophobicity CLC Gene Workbench can calculate the hydrophobicity of protein sequences in different ways using different algorithms See section 14 2 3 Furthermore hydrophobicity of sequences can be displayed as hydrophobicity plots and as graphs along sequences In addition CLC Gene Workbench 2 0 can calculate hydrophobicity for several sequences at the same time and for alignments 14 2 1 Hydrophobicity plot To display the hydrophobicity for a protein sequence in a plot is done in the following way select a protein sequence in Navigation Ar
271. ned in the sequence file when you view the sequence as text it contains a number of PUBMED lines Not all sequence have these PubMed references but in this case you will se a dialog and the browser will not open CHAPTER 9 DATABASE SEARCH 106 9 2 4 UniProt The UniProt search function searches in the UniProt database http www ebi uniprot org using the accession number Furthermore it checks whether the sequence was indeed downloaded from UniProt Chapter 10 BLAST Search Contents 10 1 BLAST Against NCBI Database 107 10 11 Output from BLAST Search sos rosg e miae 6 bee eA eR SS 110 10 L 2 BLAST table lt lt s c e sw a ee a ee we Hw 112 10 2 BLAST Against Local Database 0 00 eee eee ee ee 113 10 3 Create Local BLAST Database 2 ee eee eee ee ee es 114 CLC Gene Workbench offers to conduct BLAST searches on protein and DNA sequences In short a BLAST search identifies homologous sequences by searching one or more databases hosted by NCBI http www ncbi nlm nih gov on your query sequence McGinnis and Madden 2004 BLAST Basic Local Alignment Search Tool identifies homologous sequences using a heuristic method which finds short matches between two sequences After initial match BLAST attempts to start local alignments from these initial matches From CLC Gene Workbench 2 0 it is also possible to conduct BLAST searches on a database stored locally
272. nflict between the sequence reads Residues that are different from the contig are colored as default providing an overview of the inconsistencies Since the next inconsistency in the contig is automatically selected it is easy to make changes You can also use the Space key to find the next inconsistency e Sequence layout There is one additional parameter regarding the sequence layout Compactness In the Sequence Layout view preferences you can control the level of sequence detail to be displayed x Not compact The normal setting with full detail x Low Hides the trace data and puts the reads annotations on the sequence Medium The labels of the reads and their annotations are hidden and the residues of the reads can not be seen x Compact Even less space between the reads Furthermore it is not possible to wrap contigs as you can do with alignments e Alignment info There is one additional parameter Coverage Shows how many sequence reads that are contributing information to a given position in the contig The level of coverage is relative to the overall number of sequence reads that are included in the contig CHAPTER 16 ASSEMBLY 218 x Foreground color Colors the letters using a gradient where the left side color is used for low coverage and the right side is used for maximum coverage x Background color Colors the background of the letters using a gradient where the left side color is used for low c
273. nment e Gap extension cost The price for every extension past the initial gap If you expect a lot of small gaps in your alignment the Gap open cost should equal the Gap extension cost On the other hand if you expect few but large gaps the Gap open cost should be set significantly higher than the Gap extension cost However for most alignments it is a good idea to make the Gap open cost quite a bit higher than the Gap extension cost The default values are 10 0 and 1 0 for the two parameters respectively e End gap cost The price of gaps at the beginning or the end of the alignment One of the advantages of the CLC Gene Workbench 2 0 alignment method is that it provides flexibility in the treatment of gaps at the ends of the sequences There are three possibilities Free end gaps Any number of gaps can be inserted in the ends of the sequences without any cost Cheap end gaps All end gaps are treated as gap extensions and any gaps past 10 are free End gaps as any other Gaps at the ends of sequences are treated like gaps in any other place in the sequences When aligning a long sequence with a short partial sequence it is ideal to use free end gaps since this will be the best approximation to the situation The many gaps inserted at the ends are not due to evolutionary events but rather to partial data Many homologous proteins have quite different ends often with large insertions or deletions This confuses ali
274. no acids The zoom options described in section 3 3 allow you to e g zoom out in order to see more of the sequence in one view There are a number of options for viewing and editing the sequence which are all described in this section All the options described in this section also apply to alignments further described in section 18 2 117 CHAPTER 11 VIEWING AND EDITING SEQUENCES 118 11 1 1 Sequence Layout in Side Panel Each view of a sequence has a Side Panel located at the right side of the view When you make changes in the Side Panel the view of the sequence is instantly updated To show or hide the Side Panel select the View Ctrl U or Click the 3 at the top right corner of the Side Panel to hide Click the gray Side Panel button to the right to show When you open a view the Side Panel has default settings which can be changed in the User Preferences see chapter 4 Below each group of preferences will be explained Some of the preferences are not the same for nucleotide and protein sequences but the differences will be explained for each group of preferences Notice When you make changes to the settings in the Side Panel they are not automatically saved when you save the sequence Click Save restore Settings 5 to save the settings see section 4 5 for more information Sequence Layout These preferences determine the overall layout of the sequence e Space every 10 residues Inserts a space every 10 res
275. ns to search for can be specified e Maximum pattern length Here the maximum length of patterns to search for can be specified e Noise Specify noise level of the model This parameter has influence on the level of degeneracy of patterns in the sequence s The noise parameter can be 1 2 5 or 10 percent e Number of different kinds of patterns to predict Number of iterations the algorithm goes through After the first iteration we force predicted pattern positions in the first run to be member of the background In that way the algorithm finds new patterns in the second iteration Patterns marked Pattern1 have the highest confidence The maximal iterations to go through is 3 e Show result of patterns discovery in a table Generate a tabular output which displays patterns found e Include Background Distribution of Amino Acids For protein sequences it is possible to include information on the background distribution of amino acids from a range of organisms Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish This will open a view showing the patterns found as annotations on the original sequence see figure 12 22 If you have selected several Sequences a corresponding number of views will be opened CHAPTER 12 GENERAL SEQUENCE ANALYSES 162 Pattern1 Pattern1 CS 3VCNKNGQTA EDLAWSYGFP ECARFLTMIK CMQTARSSGE Figure 12 22 Sequence view displaying two discovered patterns 12
276. nsus MVHLTAEEKN AVTALWGKV NVDEVGG EALGRLLVVY PWTQRFFESF GDLSTPDAVM GNPK Sequence Logo MVHET EEKe AvTRLWGKV AVsEvGG EALGRLLVVY PWTSRFFesF GbLS esAvM NPK II Ia Figure 18 5 The top figures shows the original alignment In the bottom panel a single sequence with four inserted X s are aligned to the original alignment This introduces gaps in all sequences of the original alignment All other positions in the original alignment are fixed This feature is useful if you wish to add extra sequences to an existing alignment in which case you just select the alignment and the extra sequences and choose not to redo the alignment It is also useful if you have created an alignment where the gaps are not placed correctly In this case you can realign the alignment with different gap cost parameters 18 1 4 Fixpoints With fixpoints you can get full control over the alignment algorithm The fixpoints are points on the sequences that are forced to align to each other Fixpoints are added to sequences or alignments before clicking Create alignment To add a fixpoint open the sequence or alignment and Select the region you want to use as a fixpoint right click the selection Set alignment fixpoint here This will add an annotation labeled Fixpoint to the sequence see figure 18 6 Use this procedure to add fixpoints to the other sequence s that should be forced to align to each other HBA_ANAPE HBA_ANSSE
277. ntor C 1969 Mammalian Protein Metabolism ed HN Munro chapter Evolution of protein molecules pages 21 32 New York Academic Press Knudsen and Miyamoto 2001 Knudsen B and Miyamoto M M 2001 A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins Proc Natl Acad Sci USA 98 25 14512 14517 Kyte and Doolittle 1982 Kyte J and Doolittle R F 1982 A simple method for displaying the hydropathic character of a protein J Mol Biol 157 1 105 132 Larget and Simon 1999 Larget B and Simon D 1999 Markov chain monte carlo algorithms for the bayesian analysis of phylogenetic trees Mol Biol Evol 16 750 759 Leitner and Albert 1999 Leitner T and Albert J 1999 The molecular clock of HIV 1 unveiled through analysis of a known transmission history Proc Natl Acad Sci U S A 96 19 10752 10757 Maizel and Lenk 1981 Maizel J V and Lenk R P 1981 Enhanced graphic matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci U S A 78 12 7665 7669 McGinnis and Madden 2004 McGinnis S and Madden T L 2004 BLAST at the core of a powerful and diverse set of sequence analysis tools Nucleic Acids Res 32 Web Server issue W20 W25 Michener and Sokal 1957 Michener C and Sokal R 1957 A quantitative approach to a problem in classification Evolution 11 130 162 Purvis 1995 Purvis A 1995 A composite estimate of primate phyloge
278. ny Philos Trans R Soc Lond B Biol Sci 348 1326 405 421 BIBLIOGRAPHY 274 Rose et al 1985 Rose G D Geselowitz A R Lesser G J Lee R H and Zehfus M H 1985 Hydrophobicity of amino acid residues in globular proteins Science 229 4716 834 838 Saitou and Nei 1987 Saitou N and Nei M 1987 The neighbor joining method a new method for reconstructing phylogenetic trees Mol Biol Evol 4 4 406 425 SantaLucia 1998 SantaLucia J 1998 A unified view of polymer dumbbell and oligonu cleotide DNA nearest neighbor thermodynamics Proc Natl Acad Sci U S A 95 4 1460 1465 Schneider and Stephens 1990 Schneider T D and Stephens R M 1990 Sequence logos a new way to display consensus sequences Nucleic Acids Res 18 20 6097 6100 Siepel and Haussler 2004 Siepel A and Haussler D 2004 Combining phylogenetic and hidden Markov models in biosequence analysis J Comput Biol 11 2 3 413 428 Sneath and Sokal 1973 Sneath P and Sokal R 1973 Numerical Taxonomy Freeman San Francisco Tobias et al 1991 Tobias J W Shrader T E Rocap G and Varshavsky A 1991 The N end rule in bacteria Science 254 5036 1374 1377 Wootton and Federhen 1993 Wootton J C and Federhen S 1993 Statistics of local complexity in amino acid sequences and sequence databases Computers iin Chemistry 17 149 163 Yang and Rannala 1997 Yang Z and Rannala B 1997 Bayesia
279. o identify conserved residues in aligned domains of protein sequences and a wide range of other applications Each position of the alignment and consequently the sequence logo show the sequence information in a computed score based on Shannon entropy Schneider and Stephens 1990 The height of the individual letters represent the sequence information content in that particular position of the alignment A sequence logo is a much better visualization tool than a simple consensus sequence An example hereof is for instance an alignment where in one position a particular residue is found in 70 of the sequences If a consensus sequence would be defined it typically only displays the single residue with 70 coverage In figure 18 8 and ungapped alignment of 11 E coli start codons including flanking regions are shown In this example a consensus sequence would only display ATG as the start codon in position 1 but the looking at the sequence logo it is seen that a GTG is also allowed as a start codon CHAPTER 18 SEQUENCE ALIGNMENT 248 20 1 20 l l 1 talA CTTTTCAAGG AGTATTTCCT ATGAACGAGT TAGACGGCAT evgA CATTGCAAAG GGAATAATCT ATGAACGCAA TAATTATTGA ypdl CATTTTCAGG ATAACTTTCT ATGAAAGTAA ACTTAATACT niB GAAAAGAAAT CGAGGCAAAA ATGAGCAAAG TCAGACTCGC hmpA TGCAAAAAAA GGAAGACCAT ATGCTTGACG CTCAAACCAT narQ TTTTTGTGGA GAAGACGCGT GTGATTGTTA AACGACCCGT gif GTTATTAAGG ATATGTTCAT ATGTTTTTCA AAAAGAACCT intS TACCCACCGG ATTTTTACCC ATGCTCACCG TTAAGCAGAT yfdF
280. o annotate right click the selection Add Annotation This will display a dialog like the one in figure 11 2 The left hand part of the dialog lists a number of Annotation types When you have selected an annotation type it appears in Chosen type You can also select an annotation from the Chosen type list Choosing an annotation type is mandatory The right hand part of the dialog contains the following text fields CHAPTER 11 VIEWING AND EDITING SEQUENCES 125 Add annotation Annotation types Other properties H DNA RNA Features a All Protein Features o ET gt Alignment fixpoint Name Test Type Misc Feature Note Evidence Region 10 26 xX Cancel QP Help Figure 11 2 The Add Annotation dialog e Name The name of the annotation which can be shown in the view Whether the name is shown depends on the Annotation Layout preferences see section 11 1 1 e Chosen type Reflects the left hand part of the dialog as described above e Note This is a field for entering notes about the annotation The note will be displayed in a tooltip when you hold the mouse pointer over the sequence e Evidence There are two options for the evidence supporting the annotation experimental and non experimental e Region If you have already made a selection this field will show the positions of the selection You can modify the region further using the syntax of using the conventions of DDBJ EMB
281. o or more genetic variants it is possible to detect genetic variants by the presence or absence of fluorescence in the reaction Notice in CLC Gene Workbench it is possible to annotate sequences with SNP information from dbSNP and use this information to guide TaqMan allele specific probe design A specific requirement of TaqMan probes is that a G nucleotide can not be present at the 5 end since this will quench the fluorescence of the reporter dye It is recommended that the melting temperature of the TaqMan probe is about 10 degrees celsius higher than that of the primer pair Primer design for TaqMan technology involves designing a primer pair and a TaqMan probe In TaqMan mode the user must thus define three regions a Forward primer region a Reverse primer region and a TaqMan Probe region These are defined using the mouse right click menu CHAPTER 15 PRIMERS 198 The TaqMan Probe region is per default oriented forward i e made so that it binds to the complementary strand of the displayed single stranded template To obtain a TaqMan Probe region which is oriented in the reverse direction chose the double stranded sequence layout and select the complementary strand before defining the TaqMan Probe region If areas are known where primers or probes must not bind e g repeat rich areas one or more No primers here regions can be defined It is required that the Forward primer region is located upstream of the TaqMan Probe region
282. odifying the layout The background of the lane and the colors of the bands can be changed in the Side Panel Click the colored box to display a dialog for picking a color The slider Scale band spread can be used to adjust the effective time of separation on the gel i e how much the bands will be spread over the lane In a real electrophoresis experiment this property will be determined by several factors including time of separation voltage and gel density You can also modify the layout of the view by zooming in or out Click Zoom in 90 or Zoom out FP in the Toolbar and click the view Finally you can modify the format of the text heading each lane in the Text format preferences in the Side Panel Chapter 18 Sequence alignment Contents 18 1 Create an alignment 241 18 41 1 CAPCOS S si o a a a e ra E Gee eee 242 18 1 2 Fast or accurate alignment algorithm a 242 18 1 3Aligning alignments 4 a 243 US LA FIXPOINS oros ic aa a aE E a a i Swe E 244 18 2 View alignments iaa AA AA A AA 246 18 2A SCQUence lo o Dm pe e e Glew a a de 247 18 2 2 CONSEIVALION io dr e a a a ee a 248 182 3 Gap TAHON aoa See a EOR ro he A a 249 18 3 Edit alig Mments s i s s romea cara o a aaa aa 249 18 3 1 Move residues and BapS Belio o a 249 18 342 Insert gap Columns ss a a os wR EA Clee BE a OR RS 249 18 3 3 Dele
283. of proteins as described in this chapter 14 1 Protein charge In CLC Gene Workbench you can create a graph in the electric charge of a protein as a function of pH This is particularly useful for finding the net charge of the protein at a given pH This knowledge can be used e g in relation to isoelectric focusing on the first dimension of 2D gel electrophoresis The isoelectric point pl is found where the net charge of the protein is zero The calculation of the protein charge does not include knowledge about any potential post translational modifications the protein may have In order to calculate the protein charge Select a protein sequence Toolbox in the Menu Bar Protein Analyses Create Protein Charge Plot or right click a protein sequence Toolbox Protein Analyses ih Create Protein Charge Plot This opens the dialog displayed in figure 14 1 If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or 171 CHAPTER 14 PROTEIN ANALYSES 172 Create Protein Charge Plot 1 Select a protein Projects LL Example data E Nucleotide S E Protein EE 3D structures Se Sequences CAA24102 us NP_OSS6S2 Ae P65046 us P68053 As P68063 GA sw P68225 su P68228 colts P68231 P68873 sw P68945 fa Extra 5 Performed analyses README Select
284. og gt List E3 AF134224 AJ871593 FEB List o Name Accession Definition Modification Date Length AF134224 AF134224 Equus caballus beta hem 17 APR 2000 171 AJ871593 A3871593 Homo sapiens partial HB 17 NO 2005 142 Figure 11 9 A sequence list of two sequences can be viewed in either a table or in a graphical sequence list 11 5 1 Graphical view of sequence lists The graphical view of sequence lists is almost identical to the view of single sequences see section 11 1 The main difference is that you now can see more than one sequence in the same view However you also have a few extra options for sorting deleting and adding sequences To add extra sequences to the list right click an empty white space in the view and select Add Sequences To delete a sequence from the list right click the sequence s label and select Delete Sequence To sort the sequences in the list right click the label of one of the sequences and select Sort Sequence List by Name or Sort Sequence List by Length To rename a sequence right click the label of the sequence and select Rename Sequence CHAPTER 11 VIEWING AND EDITING SEQUENCES 132 11 5 2 Sequence list table Each sequence in the table sequence list is displayed with e Name e Accession Definition Modification date Length In the View preferences for the table view of the sequence list columns can be excluded and the view pref
285. omatogram Format SCF ABI sequencer data files ABI and AB1 and PHRED output files PHD see section 6 1 1 After import the sequence reads and their trace data are saved as DNA sequences This means that all analyzes which apply to DNA sequences can be performed on the sequence reads including e g BLAST and open reading frame prediction To view the trace data open the sequence read in a standard sequence editor In the Nucleotide info preference group the display of trace data can be selected and unselected When selected the trace data information is shown as a plot beneath the sequence The appearance of the plot can be adjusted using the following options see figure 16 1 208 CHAPTER 16 ASSEMBLY 209 e Nucleotide trace For each of the four nucleotides the trace data can be selected and unselected e Show confidence If confidence information was provided by the base calling algorithm this can be displayed as a bar plot behind the trace plots The confidence data is displayed as the log transformed value of the probability of a given nucleotide position being correctly assigned e Show as probabilities Displays confidence data as probabilities on a O 1 scale i e not log transformed e Scale traces A slider which allows the user to scale the trace plots to the desired level of detail gt trace file E3 n N Sequence Settings trace fle TCAAAAAAGAGGAAGAAGTGCTT y Su 4 v Nucleotide info Trace data g
286. ombined Workbench m Batch processing Free Protein Gene Combined Processing of multiple analyses in one single a a a work step Database searches Free Protein Gene Combined GenBank Entrez searches a E a nm UniProt searches Swiss Prot TrEMBL nm m Web based sequence search using BLAST u a a PubMed searches C E y Web based lookup of sequence data E a E General sequence analyses Free Protein Gene Combined Linear sequence view E E E E Circular sequence view a E a a Text based sequence view E m E m Editing sequences a E a Adding and editing sequence annotations a a a Sequence statistics 5 li nm a Shuffle sequence 5 a m m Local complexity region analyses C E a Advanced protein statistics E a Comprehensive protein characteristics report a a For a more detailed comparison we refer to http www clcbio com 265 APPENDIX A COMPARISON OF WORKBENCHES 266 Nucleotide analyses Free Protein Gene Combined Basic gene finding E u E u Reverse complement without loss of annota a E E E tion Restriction site analysis E E E E Advanced interactive restriction site analysis a C Translation of sequences from DNA to pro E a a a teins Interactive translations of sequences and a a E alignments G C content analyses and graphs a a C Annotate with known SNP s in dbSNP data a E base Protein analyses Free Protein Gene Combined 3D molecule view a E Hydrophobicity anal
287. ontributed to the contig This may be due to trimming before or during the assembly or due to misalignment to the other reads Apart from this the view CHAPTER 16 ASSEMBLY 217 Contig rar comer Alignment info Trace of 1041063818107 scf TIA C A G A dr Nucleotide info gt Translation Trace data M Show Trace of 1041063818126 scf V trace Trace data Ctrace KE Gtrace T trace Trace of 1041063818147 scf i Trace data Trace of 1041063818160 scf aes heichtliow Trace data Scale traces gt GIC content gt gt Text Format v Figure 16 8 The view of a contig Notice that you can zoom to a very detailed level in contigs resembles that of alignments see section 18 2 but has some extra preferences in the Side Panel e Assembly Layout A new preference group located at the top of the Side Panel Gather sequences at top This option affects the view that is shown when scrolling along a contig If selected the sequence reads which did not contribute to the visible part of the contig will be omitted whereas the contributing sequence reads will automatically be placed right below the contig Show sequence ends Regions that have been trimmed are shown with faded traces and residues This illustrates that these regions have been ignored during the assembly Find Inconsistency Clicking this button selects the next position where there is an co
288. or which to consider the C G content Max no of G C The maximum number of G and C nucleotides allowed within the specified length interval Min no of G C The minimum number of G and C nucleotides required within the specified length interval CHAPTER 15 PRIMERS 189 e 5 end G C restrictions When this checkbox is selected it is possible to specify restrictions concerning the number of G and C molecules in the 5 end of primers and probes A high G C content facilitates a tight binding of the oligo to the template but also increases the possibility of mis priming Unfolding the preference groups yields the same options as described above for the 3 end e Mode Specifies the reaction type for which primers are designed Standard PCR Used when the objective is to design primers or primer pairs for PCR amplification of a single DNA fragment Nested PCR Used when the objective is to design two primer pairs for nested PCR amplification of a single DNA fragment Sequencing Used when the objective is to design primers for DNA sequencing TaqMan Used when the objective is to design a primer pair and a probe for TaqMan quantitative PCR Each mode is described further below e Calculate Pushing this button will activate the algorithm for designing primers 15 3 Graphical display of primer information The primer information settings are found in the Primer information preference group in the Side P
289. orkbench Suppose you are working with the NP_O58652 protein which constitutes the beta part of the hemoglobin molecule that is expressed in the adult house mouse Mus musculus To obtain more CHAPTER 2 TUTORIALS 39 A HZ HUMHBB Annotatio Name Position End HBB thalassemia join 62187 62 162408 join 19541 19 confit I join 34531 34 34530 35982 join 39467 39 39466 40898 jioin 45710 45 45709 47124 Exon ljoin 54790 54 54789 56259 join 62187 62 62186 63610 Gene Conflict Conflict 37486 37485 37486 Exon Exon 1 lt 45710 45800 45709 4800 Oldsequence Exon Exon 1 lt 62187 62278 62186 62278 Exon Exon 2 62390 lt 62408 62389 62408 Precursor RNA Exon Exon 1 34478 34622 34477 34622 gt Exon Exon 1 39414 39558 39413 39558 Exon Exon 3 46997 lt 47124 46996 Repeat resion Exon Exon 1 54740 54881 54739 Exon Exon 1 62137 62278 62136 D HUMHBB 3 i 20000 20500 21000 HBE HUMHBB lt gt Figure 2 17 Two views of the HUMHBB sequence The upper view shows the coding sequences CDS and the bottom view shows a selection corresponding to the CDS chosen in the upper view information about this molecule you wish to query the Swiss Prot database to find homologous proteins in humans Homo sapiens using the Basic Local Alignment Search Tool BLAST algorithm Please note that your computer must be connect
290. ot plots can also be used to compare regions of similarity within a sequence This chapter first describes how to create and second how to adjust the view of the plot 136 CHAPTER 12 GENERAL SEQUENCE ANALYSES 137 12 1 1 Create dot plots A dot plot is a simple yet intuitive way of comparing two sequences either DNA or protein and is probably the oldest way of comparing two sequences Maizel and Lenk 1981 A dot plot is a 2 dimensional matrix where each axis of the plot represents one sequence By sliding a fixed size window over the sequences and making a sequence match by a dot in the matrix a diagonal line will emerge if two identical or very homologous sequences are plotted against each other Dot plots can also be used to visually inspect sequences for direct or inverted repeats or regions with low sequence complexity Various smoothing algorithms can be applied to the dot plot calculation to avoid noisy background of the plot Moreover can various substitution matrices be applied in order to take the evolutionary distance of the two sequences into account To create a dot plot Toolbox General Sequence Analyses A Create Dot Plot 4 or Select one or two sequences in the Navigation Area Toolbox in the Menu Bar General Sequence Analyses 1 Create Dot Plot 2 or Select one or two sequences in the Navigation Area right click in the Navigation Area Toolbox General Sequence Analyses 4 Create Dot Plot
291. otations on the protein sequence will be mapped to the resulting DNA sequence In the tooltip on the transferred annotations there is a note saying that the annotation derives from the original sequence CHAPTER 14 PROTEIN ANALYSES 181 The Codon Frequency Table is used to determine the frequencies of the codons Select a frequency table from the list that fits the organism you are working with A translation table of an organism is created on the basis of counting all the codons in the coding sequences Every codon in a Codon Frequency Table has its own count frequency per thousand and fraction which are calculated in accordance with the occurrences of the codon in the organism Click Next if you wish to adjust how to handle the results see section 8 1 If not click Finish The newly created nucleotide sequence is shown and if the analysis was performed on several protein sequences there will be a corresponding number of views of nucleotide sequences The new sequence is not saved automatically To save the sequence drag it into the Navigation Area or press Ctrl S S on Mac to show the save dialog 14 3 2 Bioinformatics explained Reverse translation In all living cells containing hereditary material such as DNA a transcription to mRNA and subsequent a translation to proteins occur This is of course simplified but is in general what is happening in order to have a steady production of proteins needed for the survival of the ce
292. otein sequences Difference between Motif Search and Pattern Discovery In motif search See 12 6 the user has some predefined knowledge about the pattern motif of interest This motif is defined by the user and the algorithm runs through the entire sequence and looks for identical or degenerate patterns Motif search handles ambiguous characters in the way that two residues are different if they do not have any residues in common For example For nucleotides N matches any character and R matches A G For proteins X matches any character and Z matches EF Q Our pattern discovery algorithm See 12 7 is based on proprietary hidden Markov models HMM and scans the entire sequence one or more for patterns which may be unknown to the user Motifs If you have a known motif represented by a literal string or a sequence pattern of interest you can search for them using the CLC Gene Workbench Patterns and motifs can be searched with different levels of degeneracy in both DNA and protein sequences You can also search for matches with known motifs represented by a regular expression A regular expressions is a string that describes or matches a set of strings according to certain syntax rules They are usually used to give a concise description of a set without having to list all elements The simplest form of a regular expression is a literal string You are limited to the following syntax rules See the Java regular expression syntax A
293. oteins Rose scale The hydrophobicity scale by Rose et al is correlated to the average area of buried amino acids in globular proteins Rose et al 1985 This results in a scale which is not showing the helices of a protein but rather the surface accessibility Janin scale This scale also provides information about the accessible and buried amino acid residues of globular proteins Janin 1979 Many more scales have been published throughout the last three decades Even though more advanced methods have been developed for prediction of membrane spanning regions the simple and very fast calculations are still highly used Other useful resources AAindex Amino acid index database http www genome ad jp dbget aaindex html Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in it s original form and CLC bio has to be clearly labelled as author and provider of the work You may not use this work for commercial purposes You may not alter transform or build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more about how you may use the contents CHAPTER 14 PROTEIN ANALYSES 179 aa aa Kyte Hopp Cornette Eisenberg Rose Janin Engel
294. ou to see labels for the internal nodes Initially there are no labels but right clicking a node allows you to type a label Label color Changes the color of the labels on the tree nodes Branch label color Modifies the color of the labels on the branches Node color Sets the color of all nodes Line color Alters the color of all lines in the tree e Annotation Layout Specifies the annotation in the tree CHAPTER 19 PHYLOGENETIC TREES 259 Nodes Sets the annotation of all nodes either to name or to species Branches Changes the annotation of the branches to bootstrap length or none if you don t want annotation on branches Notice Dragging in a tree will change it You are therefore asked if you want to save this tree when the Tree Viewer is closed You may select part of a Tree by clicking on the nodes that you want to select Right click a selected node opens a menu with the following options e Set root above node defines the root of the tree to be just above the selected node e Set root at this node defines the root of the tree to be at the selected node e Toggle collapse collapses or expands the branches below the node e Change label allows you to label or to change the existing label of a node e Change branch label allows you to change the existing label of a branch You can also relocate leaves and branches in a tree or change the length Notice To drag branches of a tree you must f
295. overage and the right side is used for maximum coverage x Graph The coverage is displayed as a graph beneath the contig Height Specifies the height of the graph Type The graph can be displayed as Line plot Bar plot or as a Color bar Color box For Line and Bar plots the color of the plot can be set by clicking the color box If a Color bar is chosen the color box is replaced by a gradient color box as described under Foreground color e Residue coloring There is one additional parameter Assembly Colors This option lets you use different colors for the residues of the contig and the forward and reverse reads It is particularly useful for getting an overview of forward and reverse reads in the contig x Contig color Colors the residues of the contig sequence with the specified color can be changed by clicking the colored box x Forward color Colors the residues of forward reads with the specified color can be changed by clicking the colored box x Reverse color Colors the residues of reverse reads with the specified color can be changed by clicking the colored box Beside from these preferences all the functionalities of the alignment view are available This means that you can e g add annotations such as SNP annotation to regions of interest in the contig However some of the parameters from alignment views are set at a different default value in the view of contigs Trace data of the sequencing reads a
296. ow complexity regions inverted repeats etc can be identified visually Similar sequences The most simple example of a dot plot is obtained by plotting two homologous sequences of interest If very similar or identical sequences are plotted against each other a diagonal line will occur The dot plot in figure 12 5 shows two related sequences of the Influenza A virus nucleoproteins infecting ducks and chickens Accession numbers from the two sequences are DQ232610 and DQ023146 Both sequences can be retrieved directly from http www ncbi nlm nih gov gquery gquery fcgi Repeated regions Sequence repeats can also be identified using dot plots A repeat region will typically show up as lines parallel to the diagonal line If the dot plot shows more than one diagonal in the same region of a sequence the regions depending to the other sequence are repeated In figure 12 7 you can see a sequence with repeats Frame shifts Frame shifts in a nucleotide sequence can occur due to insertions deletions or mutations Such frame shifts can be visualized in a dot plot as seen in figure 12 8 In this figure three frame shifts for the sequence on the y axis are found 1 Deletion of nucleotides 2 Insertion of nucleotides 3 Mutation out of frame CHAPTER 12 GENERAL SEQUENCE ANALYSES 141 DQ23281 va 0Q923146 Figure 12 5 Dot plot of DO232610 vs DQ023146 Influenza A virus nucleoproteins showing and overall similarity Dir
297. ow the contig is displayed See section 16 6 on how to use the resulting contigs 16 4 Assemble to reference sequence This section describes how to assemble a number of sequence reads into a contig using a reference sequence A reference sequence can be particularly helpful when the objective is to CHAPTER 16 ASSEMBLY 214 characterize SNP variation in the data Note that CLC Gene Workbench allows you to annotate a reference sequence with known SNP information from the dbSNP database see section 13 5 To start the assembly select sequences to assemble Toolbox in the Menu Bar Assembly 3 Assemble Sequences to Reference This opens a dialog where you can alter your choice of sequences which you want to assemble You can also add sequence lists When the sequences are selected click Next and you will see the dialog shown in figure 16 5 9 Assemble Sequences to Reference 1 Select some nucleotide MESE sequences 2 Set parameters Reference sequence Sequence chosen as reference 206 PERH3BC ho Alignment options Minimum aligned read length 50 Alignment stringency Medium v Trimming options Use existing trim information Generally not necessary since a reference sequence is used OI e ae Figure 16 5 Setting assembly parameters when assembling to a reference sequence This dialog gives you the following options for assembling e Reference sequence Click the Browse
298. parative methods of inference as the phylogeny describes the underlying correlation from shared history that exists between data from different species CHAPTER 19 PHYLOGENETIC TREES 261 In molecular epidemiology of infectious diseases phylogenetic inference is also an important tool The very fast substitution rate of microorganisms especially the RNA viruses means that these show substantial genetic divergence over the time scale of months and years Therefore the phylogenetic relationship between the pathogens from individuals in an epidemic can be resolved and contribute valuable epidemiological information about transmission chains and epidemiologically significant events Leitner and Albert 1999 Forsberg et al 2001 19 2 3 Reconstructing phylogenies from molecular data Traditionally phylogenies have been constructed from morphological data but following the growth of genetic information it has become common practice to construct phylogenies based on molecular data known as molecular phylogeny The data is most commonly represented in the form of DNA or protein sequences but can also be in the form of e g restriction fragment length polymorphism RFLP Methods for constructing molecular phylogenies can be distance based or character based Distance based methods Two common algorithms both based on pairwise distances are the UPGMA and the Neighbor Joining algorithms Thus the first step in these analyses is to comput
299. phy alignments GCG Alignment msf alignments Clustal Alignment aln alignments Newick nwk trees FASTA fsa fasta sequences GenBank gbk gb gp Sequences GCG sequence gcg sequences only import PIR NBRF pir sequences only import Staden sdn sequences only import VectorNTI sequences only import DNAstrider str strider sequences Swiss Prot SWp protein sequences Lasergene sequence pro protein sequence only import Lasergene sequence seq nucleotide sequence only import Embl embl nucleotide sequences Nexus nxs nexus sequences trees alignments and sequence lists CLC clc sequences trees alignments reports etc Text txt all data in a textual format ABI Trace files only import AB1 Trace files only import SCF2 Trace files only import SCF3 Trace files only import Phred Trace files only import mmCIF Cif structure only import PDB pdb structure only import Preferences cpf CLC workbench preferences Notice that CLC Gene Workbench can import external files too This means that CLC Gene Workbench can import all files and display them in the Navigation Area while the above mentioned formats are the types which can be read by CLC Gene Workbench 2 2 Tutorial View sequence This brief tutorial will take you through some different ways to display a sequence in the program The tutorial introduces zooming on a sequence dragging tabs and opening selection in new view We will
300. primers in a pair the number of degrees Celsius that primers in a pair are all allowed to differ e Maximum pair annealing score the maximum number of hydrogen bonds allowed between the forward and the reverse primer in a primer pair CHAPTER 15 PRIMERS 203 The output of the design process is a table of single primers or primer pairs as described for primer design based on single sequences These primers are specific to the included sequences in the alignment according to the criteria defined for specificity The only novelty in the table is that melting temperatures are displayed with both a maximum a minimum and an average value to reflect that degenerate primers or primers with mismatches may have heterogeneous behavior on the different templates in the group of included sequences Y Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Exclusion parameters Minimum number of mismatches h p Minimum number of mismatches in 3 end 0 Length of 3 end 1 Primer combination parameters Max percentage point difference in G C content E Max difference in melting temperatures within a primer pair Max pair annealing score wi Calculate X C
301. quirements 5 end must meet G C requirements Primer combination parameters Max percentage point difference im G C content 150 Max difference in melting temperatures within a primer pair los a Max pair annealing score 185 Minimum difference in melting temperature Inner Outer 5 Desired difference in melting temperature Inner Outer 103 Fast Accurate Mispriming parameters Use mispriming as exclusion criteria Y Calculate amp Help Figure 15 9 Calculation dialog e Maximum percentage point difference in G C content described above under Standard PCR this criteria is applied to both primer pairs independently e Maximal difference in melting temperature of primers in a pair the number of degrees Celsius that primers in a pair are all allowed to differ This criteria is applied to both primer pairs independently e Maximum pair annealing score the maximum number of hydrogen bonds allowed between the forward and the reverse primer in a primer pair This criteria is applied to all possible combinations of primers e Minimum difference in the melting temperature of primers in the inner and outer primer pair all comparisons between the melting temperature of primers from the two pairs must be at least this different otherwise the primer set is excluded This option is applied to ensure that the inner and outer PCR reactions can be initiated at different annealing temperatures e Desired temperatur
302. r for alignments can be accessed in two ways select alignment Toolbox Primers and Probes 1 Design Primers z OK or right click alignment Show Primer Design 5 In the alignment primer view see figure 15 12 the basic options for viewing the template alignment are the same as for the standard view of alignments See section 18 for an explanation of these options Notice This means that features such as e g known SNP s or exons can be displayed on the template sequence to guide the choice of primer regions Since the definition of groups of sequences is essential to the primer design the selection boxes of the standard view are shown as default in the alignment primer view 15 9 1 Specific options for alignment based primer and probe design Compared to the primer view of a single sequence the most notable difference is that the alignment primer view has no available graphical information Furthermore the selection boxes found to the right of the names in the alignment play an important role in specifying the oligo design process This is elaborated below The Primer Parameters preference group has the same options for specifying primer requirements but differs by the following see figure 15 12 e In the Mode submenu which specifies the reaction types the following options are found Standard PCR Used when the objective is to design primers or primer pairs for PCR amplification of a single DNA fragment TaqMan U
303. re Only rounded numbers are found in this matrix The two most used matrices are the BLOSUM Henikoff and Henikoff 1992 and PAM Dayhoff and Schwartz 1978 Different scoring matrices PAM The first PAM matrix Point Accepted Mutation was published in 1978 by Dayhoff et al The PAM matrix was build through a global alignment of related sequences all having sequence similarity above 85 Dayhoff and Schwartz 1978 A PAM matrix shows the probability that any given amino acid will mutate into another in a given time interval As an example PAM1 gives that one amino acid out of a 100 will mutate in a given time interval In the other end of the scale a PAM256 matrix gives the probability of 256 mutations in a 100 amino acids see figure 12 11 There are some limitation to the PAM matrices which makes the BLOSUM matrices somewhat more attractive The dataset on which the initial PAM matrices were build is very old by now and the PAM matrices assume that all amino acids mutate at the same rate this is not a correct assumption BLOSUM In 1992 14 years after the PAM matrices were published the BLOSUM matrices BLOcks SUbstitution Matrix were developed and published Henikoff and Henikoff 1992 Henikoff et al wanted to model more divergent proteins thus they used locally aligned CHAPTER 12 GENERAL SEQUENCE ANALYSES 144 SI HEE DH SH ESD ED SE Figure 12 9 The dot plot showing a inversion in a sequence See also figure
304. re are a number of alignment specific view options in the Alignment info preference group in the Side Panel to the right of the view These preferences relate to each column in the alignment Below is more information on these view options e Consensus Shows a consensus sequence at the bottom of the alignment The consensus sequence is based on every single position in the alignment and reflects an artificial sequence which resembles the sequence information of the alignment but only as one single sequence If all sequences of the alignment is 100 identical the consensus sequence will be identical to all sequences found in the alignment If the sequences of the alignment differ the consensus sequence will reflect the most common sequences in the alignment Parameters for adjusting the consensus sequences are described above The Consensus Sequence can be opened in a new view simply by right clicking the Consensus Sequence and click Open Consensus in New View Limit This option determines how conserved the sequences must be in order to agree on a consensus No gaps Checking this option will not show gaps in the consensus Ambiguous symbol Select how ambiguities should be displayed in the consensus line e Sequence logo See section 18 2 1 for more details Foreground color Colors the letters using a gradient according to the information content of the alignment column Background color Sets a background color of th
305. re asked if you want to save the file If you do not want to view the sequence first the sequence can be saved by dragging it from the list of hits into the Navigation Area 2 4 Tutorial Align protein sequences It is possible to create multiple alignments of nucleotide and protein sequences CLC Gene Workbench offers several opportunities to view alignments The alignments can be used for building phylogenetic trees CHAPTER 2 TUTORIALS 33 The sequences must be saved in the Navigation Area in order to be included in an alignment To save a sequence which is displayed in the View Area click the tab of the sequence and press Ctrl S or S on Mac In this tutorial eight protein sequences from the Example data will be aligned See figure 2 7 power use ys P4443 P67821 4 QEHIUF Figure 2 7 Eight protein sequences in a Protein project in the Navigation Area To begin aligning the protein sequences select the sequences right click either of the sequences Toolbox Alignments and Trees 3 Create Alignment Z 2 4 1 Alignment dialog This opens the dialog shown in fig 2 8 9 Create Alignment 1 Select sequences or SS ES alignments of same type Projects Selected Elements LL Example data As P68046 E E Nucleotide P68053 E Protein ad P68063 3D structures fue P68225 Sea Sequences Pu P68228 Ps CAA24102 Pu P68231 Mu CAA32220 ad P68873 NP_058652 Mu P68945 tE
306. re shown if present can be enabled and disabled under the Nucleotide info preference group and the Color different residues option is also enabled in order to provide a better overview of conflicts can be changed in the Alignment info preference group 16 6 1 Editing and zooming the contig When editing contigs you are typically interested in confirming or changing single bases and this can be done simply by selecting the base and typing the right base Some users prefer to use lower case letters in order to be able to see which bases were altered when they use the contig later on In CLC Gene Workbench all changes to the contig are recorded in its history log see section 7 allowing the user to quickly reconstruct the actions performed in the editing session There are three shortcut keys for quick editing e Space bar Finds the next inconsistency e punctuation mark key Finds the next inconsistency e comma key Finds the previous inconsistency CHAPTER 16 ASSEMBLY 219 In the contig view you can use Zoom in to zoom to a greater level of detail than in other views see figure 16 8 This is useful for discerning the trace curves If you want to replace a residue with a gap use the Delete or Backspace key 16 6 2 Output from the contig Due to the integrated nature of CLC Gene Workbench it is easy to use the created contig sequence as input for additional analyzes If you wish to use the contig sequence for ot
307. recognized files Y Import x Cancel Figure 6 1 If the dragged file is not recognized by CLC Gene Workbench the dialog allows you to force the import in a certain format Notice When browsing for files to import the dialog only displays files of the format chosen in the File of type drop down menu at the bottom of the import dialog If the format clc is chosen only clc files are shown in the Import dialog Choose All Files to ensure the file you are looking for is displayed When you import a file containing several sequences you will be asked whether you want to save the sequences as individual elements or as a sequence list See section 11 5 for more about sequence lists Import of data in clc format from older versions If you want to import data in clc format generated in an older version of either of the workbenches it has to bee converted first If you try to import it without conversion you will see a warning dialog Import of Vector NTI data CLC Gene Workbench 2 0 can import DNA RNA and protein sequences from a Vector NTI Database The import can be done for Vector NTI Advance 10 for Windows machines and Vector NTI Suite 7 1 for Mac OS X for Panther and former versions A new Project will be placed in the Navigation Area and you can find all sequences in different folders ready to work with In order to import all DNA RNA and protein sequences select File in the Menu Bar Import VectorNTI Data select
308. ree Workbench Example data and download the example data from there If you download the file from the website you need to import it into the program See chapter 6 1 for more about importing data 1 7 Network configuration If you use a proxy server to access the Internet you must configure CLC Gene Workbench 2 0 to use this Otherwise you will not be able to perform any on line activities e g searching GenBank CLC Gene Workbench 2 0 supports the use of a HTTP proxy and an anonymous SOCKS proxy 9 Preferences Proxy settings Takes effect after restart V Use HTTP Proxy Server HTTP Proxy proxy mydomain Y HTTP Proxy Requires Login Account proxyuser Password xeaeee Use SOCKS Proxy Server SOCKS Host Port f You may have to restart the application For these changes to take effect Export J Import Figure 1 12 Adjusting proxy preferences To configure your proxy settings open CLC Gene Workbench 2 0 and go to the Advanced tab of the Preferences dialog figure 1 12 and enter the appropriate information You have the choice between a HTTP proxy and a SOCKS proxy CLC Gene Workbench 2 0 only supports the use of a SOCKS proxy that does not require authorization If you have any problems with these settings you should contact your systems administrator 1 8 Adjusting the maximum amount of memory If you have a large amount of memory RAM availa
309. reground and background colors x Color box Specifies the color of the graph for line and bar plots and specifies a gradient for colors e Gap fraction Which fraction of the sequences in the alignment that have gaps Foreground color Colors the letter using a gradient where the left side color is used if there are relatively few gaps and the right side color is used if there are relatively many gaps Background color Sets a background color of the residues using a gradient in the same way as described above Graph Displays the gap fraction as a graph at the bottom of the alignment x Height Specifies the height of the graph x Type The type of the graph x Color box Specifies the color of the graph for line and bar plots and specifies a gradient for colors e Color different residues Indicates differences in aligned residues Foreground color Colors the letter Background color Sets a background color of the residues 18 2 1 Sequence logo Below the alignment there is an option of displaying a sequence logo shown as default The sequence logo displays the information content of all positions in the alignment as residues or nucleotides stacked on top of each other see figure figure 18 8 The sequence logo provides a far more detailed view of the alignment than the conservation view See section 18 2 2 Sequence logos can aid to identify protein binding sites on DNA sequences but can also aid t
310. rez query BLAST searches can be limited to the results of an Entrez query against the database chosen This can be used to limit searches to subsets of the BLAST databases Any terms can be entered that would normally be allowed in an Entrez search session Some queries are preentered and can be chosen in the drop down menu e Choose filter Low complexity Mask off segments of the query sequence that have low compositional complexity Filtering can eliminate statistically significant but biologically uninteresting reports from the BLAST output e g hits against common acidic basic or proline rich regions leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences CHAPTER 10 BLAST SEARCH 110 Human Repeats This option masks Human repeats LINE s and SINE s and is especially useful for human sequences that may contain these repeats Filtering for repeats can increase the speed of a search especially with very long sequences gt 100 kb and against databases which contain large number of repeats htgs Mask for Lookup This option masks only for purposes of constructing the lookup table used by BLAST BLAST searches consist of two phases finding hits based upon a lookup table and then extending them Mask Lower Case With this option selected you can cut and paste a FASTA sequence in upper case characters and denote areas you would like filtered
311. rhang LS overhang Enzymes that comply with criteria Include Name Recognition S Overhang Methylation s Popularity AsiSI jacgategc J S methylcytosine Tsp4cl lacngt E oe a Psst ragnecy Bmul lactggg SgrBI lecgcgg Bbv121 Jowacwe Fall laagnnnnnett Sstt Chal HpyCH4IT BseST RES aa E BsrST civ stact Bsgl TSpGWI lacgga Bbel lggcgec Estar laaggag BstaPr ocannnnntac Previous rex Figure 2 13 Setting parameters for restriction site detection Click Next and choose both textual and graphical output See figure 2 14 Click Finish to start the restriction site analysis 2 6 1 View restriction site The restriction sites are shown in two views one view is in a textual format and the other view displays the sites as annotations on the sequence To see both views at once View in the menu bar Split Horizontally The result is shown in figure 2 15 Notice The results are not automatically saved CHAPTER 2 TUTORIALS 37 Find Restriction Sites 1 Select DNA sequences 2 Filter enzymes 3 Set exclusion criteria and output options Exclude enzymes based on number of matches Exclude enzymes with less matches than Exclude enzymes with more matches than Output options 7 Create output as annotations on sequence Create tabular output O Create new enzyme list from selected enzy
312. rification CLC Gene Workbench is presently not able to contact CLC bio s license server Check your internet connection and press OK to try again Figure 1 2 This dialog appears when an online license check is conducted by CLC Gene Workbench and the computer is off line Either at start up or after 24 hours You can then connect to the Internet and retry or you can save your work and close the program You can run the workbench again later as long as you are connected to the Internet at start up We use the concept of quid quo pro The last two weeks of free demo time given to you is therefore accompanied by a short form questionnaire where you have the opportunity to give us feedback about the program The four weeks demo is offered for each major release of CLC Gene Workbench You will therefore have the opportunity to try the next version CLC Gene Workbench 2 0 1 is released If you purchase CLC Gene Workbench the first year of updates is included 1 4 2 Getting and activating the demo license When you start the program for the first time you will be presented with the dialog shown in figure 1 3 If you connect to the internet via a proxy server click the proxy settings button Otherwise just click the Request evaluation license button in order to get a license key for a demo of CLC CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 16 Get license Accept agreement Activate license A license is required In ord
313. right click either selected alignment Toolbox Align ments and Trees fs3 Join Alignments Ez This opens the dialog shown in figure 18 10 CHAPTER 18 SEQUENCE ALIGNMENT 252 9 Join Alignments 1 Select a number of AEE 3 p sm of the same projects Selected Elements ue E L Example data HEE alignment 2_alignment S E Nucleotide HEE alignment 1_alignment ei Sequences 3C NM_00004 DOC AY738615 JC HUMDINUC 20 sequence 7C sequence JOC sequence gt 3S sequence DOC PERH2BA EN OC PERHIBB DOC PERHIBA 3 sequence E E Assembly w Cloning project w Primer design EE Restriction ane E3 Protein Extra E gt Figure 18 10 Selecting two alignments to be joined If you have selected some alignments before choosing the Toolbox action they are now listed in the Selected Elements window of the dialog Use the arrows to add or remove alignments from the Project Tree Click Next opens the dialog shown in figure 18 11 9 Join Alignments 1 Select a number of Strate alignments of the same type 2 Set parameters Set order of concatenation top first EZ alignment 2_alignment 4 HEE alignment 1_alignment 4 CO tre e ee Figure 18 11 Selecting order of concatenation To adjust the order of concatenation click the name of one of the alignments and move it up or down using the arrow buttons Click Next if you wish to adjust how to handl
314. riginal implementation by Schneider does not handle sequence gaps We have slightly modified the algorithm so an estimated logo is presented in areas with sequence gaps If amino acid residues or nucleotides of one sequence are found in an area containing gaps we have chosen to show the particular residue as the fraction of the sequences Example if one position in the alignment contain 9 gaps and only one alanine A the A represented in the logo has a hight of 0 1 Other useful resources The website of Tom Schneider http www lmmb ncifcrf gov toms WebLogo http weblogo berkeley edu Crooks et al 2004 18 2 2 Conservation The conservation view is very simplified view compared to the sequence logo view as described above The bar default view show the conservation of all sequence positions The height of CHAPTER 18 SEQUENCE ALIGNMENT 249 the bars in the view reflects how conserved that particular position is in the alignment If one position is 100 conserved the bar will be shown in full height 18 2 3 Gap fraction The gap fraction view show if any gaps are present in the alignment If a gap is present in the majority of sequences this will be represented in the view 18 3 Edit alignments 18 3 1 Move residues and gaps The placement of gaps in the alignment can be changed by modifying the parameters when creating the alignment See section 18 1 However gaps and residues can also be moved after the ali
315. roject est Database of GenBank EMBL DDBJ sequences from EST division est_human Human subset of est 268 APPENDIX B BLAST DATABASES 269 e est_mouse Mouse subset of est e est_others Subset of est other than human or mouse e gss Genome Survey Sequence includes single pass genomic data exon trapped se quences and Alu PCR sequences e htgs Unfinished High Throughput Genomic Sequences phases O 1 and 2 Finished phase 3 HTG sequences are in nr e pat Nucleotides from the Patent division of GenBank e pdb Sequences derived from the 3 dimensional structure records from Protein Data Bank They are NOT the coding sequences for the corresponding proteins found in the same PDB record e month All new or revised GenBank EMBL DDBJ PDB sequences released in the last 30 days e alu Select Alu repeats from REPBASE suitable for masking Alu repeats from query sequences See Alu alert by Claverie and Makalowski Nature 371 752 1994 e dbsts Database of Sequence Tag Site entries from the STS division of GenBank EMBL DDBJ e chromosome Complete genomes and complete chromosomes from the NCBI Reference Sequence project It overlaps with refseq_genomic e wgs Assemblies of Whole Genome Shotgun sequences e env_nt Sequences from environmental samples such as uncultured bacterial samples isolated from soil or marine samples The largest single source is Sagarsso Sea project This does overlap with nucleotid
316. rooks et al 2004 Crooks G E Hon G Chandonia J M and Brenner S E 2004 WebLogo a sequence logo generator Genome Res 14 6 1188 1190 Dayhoff and Schwartz 1978 Dayhoff M O and Schwartz R M 1978 Atlas of Protein Sequence and Structure volume 3 of 5 suppl chapter Atlas of Protein Sequence and Structure pages 353 358 Nat Biomed Res Found Washington D C Eddy 2004 Eddy S R 2004 Where did the BLOSUM62 alignment score matrix come from Nat Biotechnol 22 8 1035 1036 Eisenberg et al 1984 Eisenberg D Schwarz E Komaromy M and Wall R 1984 Analysis of membrane and surface protein sequences with the hydrophobic moment plot J Mol Biol 179 1 125 142 Engelman et al 1986 Engelman D M Steitz T A and Goldman A 1986 Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins Annu Rev Biophys Biophys Chem 15 321 353 Felsenstein 1981 Felsenstein J 1981 Evolutionary trees from DNA sequences a maximum likelihood approach J Mol Evol 17 6 368 376 Feng and Doolittle 1987 Feng D F and Doolittle R F 1987 Progressive sequence align ment as a prerequisite to correct phylogenetic trees J Mol Evol 25 4 351 360 Forsberg et al 2001 Forsberg R Oleksiewicz M B Petersen A M Hein J Botner A and Storgaard T 2001 A molecular clock dates the common ancestor of European type porcine reproductive and respirator
317. ror e moam bw a ew SS 22 1 7 Network configuration 1 2 ee ee 4 2 22 1 8 Adjusting the maximum amount of memory lt lt 22 1e IMICrOSOR WINGOWS sz cir aa AA 23 1 8 2 MICOS X 3 5 4 dm tae a a amp aed 23 ASS GNUR erpe ai ooh esc etre ee PPE SS ib 23 1 9 The format of the user manual ee 24 LIL TEXGTOMMMIAUS y fh xe te te Bo we ae RW ee ke a i 24 CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 10 Welcome to CLC Gene Workbench 2 0 a software package supporting your daily bioinformatics work We strongly encourage you to read this user manual in order to get the best possible basis for working with the software package CHAPTER 1 INTRODUCTION TO CLC GENE WORKBENCH 11 1 1 Contact information The CLC Gene Workbench 2 0 is developed by CLC bio A S Science Park Aarhus Gustav Wieds Vej 10 8000 Aarhus C Denmark http www clcbio com VAT no DK 28 30 50 87 Telephone 45 70 22 32 44 Fax 45 86 20 12 22 E mail info clcbio com If you have questions or comments regarding the program you are welcome to contact our support function E mail support clcbio com 1 2 Download and installation The CLC Gene Workbench is developed for Windows Mac OS X and Linux The software for either platform can be downloaded from http www clcbio com download Furthermore the program can be sent on a CD Rom by regular mail To receive the program by regular m
318. rrange Views in View Area 1 es 67 3 20 SIGE Panel e si cio a ee eee A Sa 68 3 3 Zoom and selection in View Area ee 69 Sisk ZOOMIE lus e a a BO ected ace a i 69 Doi LOOMVOUL sidra a ee etek ia Se eed ee 71 3 3 3 Fit WIDE s ce dares dS wR Aa ee Ao Bow a ee i ee wv anaa 71 SOA LOOMMIO TOO aa anioi dice ve dt oe Se oe A ed 71 BOO MOVE es ias Gea eae SE tec Bae ee eA ae he ee a eB ee ds 71 3 36 SElCCUOMs bat ec awe ed eM We ee A Se a wh ee ie O ek on ae 71 3 4 Toolbox and Status Bar lt rines 2 72 3AL PROCESSES o ioe o o ee ho A ee A A Be 72 o o A we eR wD Bigs Gee HM ai 72 AS Stas Bar 2 seh ek we we Sw eR aed ae ee aw we SS 73 3 5 WorkSpace lt acora sacia sedna aoa nooi a a Ua 73 3 5 1 Create Workspace e a a a a a a a a 73 3 5 2 Select W rkS pade o ac bce Ba ee eee a ew a ii 73 CHAPTER 3 USER INTERFACE 58 3 5 3 Delete Workspace 3 6 List of shortcuts 0 lt 0o lt lt lt lt unea 6222 This chapter provides an overview of the different areas in the user interface of CLC Gene Workbench 2 0 As can be seen from figure 3 1 this includes a Navigation Area View Area Menu Bar Toolbar Status Bar and Toolbox Y CLC Gene Workbench 2 0 Default Eile Edit Search View Toolbox Workspace Help A AM ro DA O Sb O Ba O Import Export Cut Copy Paste Delete Wor
319. rt includes the introduction and some tutorials showing how to apply the most significant functionalities of CLC Gene Workbench 2 0 e The second part describes in detail how to operate all the program s basic functionalities e The third part digs deeper into some of the bioinformatic features of the program In this part you will also find our Bioinformatics explained sections These sections elaborate on the algorithms and analyses of CLC Gene Workbench 2 0 and provide more general knowledge of bioinformatic concepts e The fourth part is the Appendix and Index Each chapter includes a short table of contents 1 9 1 Text formats In order to produce a clearly laid out content in this manual different formats are applied e A feature in the program is in bold starting with capital letters Example Navigation Area e An explanation of how a particular function is activated is illustrated by and bold E g select the element Edit Rename Icons such as B are included in order to ease the navigation in the Toolbox e The format of the program name is bold and italic CLC Gene Workbench 2 0 The captions of displayed screenshots are in italic Chapter 2 Tutorials Contents 2 1 Tutorial Starting up the program ww 26 21 1 Creating a project amada OIE seco ee we 26 2 1 2 IMPOrtdEa xs rre aaa a a ig 27 21 3 Supported data TOMAS o lt 0024 e a Y 28 2 2 Tutorial View s
320. s described below are calculated in a simple way Molecular weight The molecular weight is the mass of a protein or molecule The molecular weight is simply calculated as the sum of the atomic mass of all the atoms in the molecule The weight of a protein is usually represented in Daltons Da A calculation of the molecular weight of a protein does not usually include additional posttransla tional modifications For native and unknown proteins it tends to be difficult to assess whether posttranslational modifications such as glycosylations are present on the protein making a calculation based solely on the amino acid sequence inaccurate The molecular weight can be determined very accurately by mass spectrometry in a laboratory CHAPTER 12 GENERAL SEQUENCE ANALYSES 153 Isoelectric point The isoelectric point pl of a protein is the pH where the proteins has no net charge The pl is calculated from the pKa values for 20 different amino acids At a pH below the pl the protein carries a positive charge whereas if the pH is above pl the proteins carry a negative charge In other words pl is high for basic proteins and low for acidic proteins This information can be used in the laboratory when running electrophoretic gels Here the proteins can be separated based on their isoelectric point Aliphatic index The aliphatic index of a protein is a measure of the relative volume occupied by aliphatic side chain of the following amino a
321. s pa Popularity er F a Blunt 3 Blunt gt see RRHH FaF Enzymes currently in list Name Recognition S jatmkac J 3 ladeno No Figure 17 18 Adding and removing enzymes in the existing enzyme list Select sequences in either top or bottom table see 17 4 1 Use the arrows to add and remove sequences Click Finish to see the modified list 17 5 Gel electrophoresis CLC Gene Workbench enables the user to simulate the separation of nucleotide sequences on a gel This feature is useful when e g designing an experiment which will allow the differentiation CHAPTER 17 CLONING AND CUTTING 236 of a successful and an unsuccessful cloning experiment on the basis of a restriction map There are two main ways to simulate gel separation of nucleotide sequences e Anumber of existing sequences can be separated on a gel e One or more sequences can be digested with restriction enzymes and the resulting fragments can be separated on a gel There are several ways to apply these functionalities as described below 17 5 1 Separate sequences on gel This section explains how to simulate a gel electrophoresis of one or more existing sequences without restriction enzymes digestion select one or more sequences Toolbox Cloning and Restriction Sites ES Separate Sequences on Gel E3 This opens the dialog shown in figure 17 19 Y Separate Sequences on Ge
322. s purposes This chapter begins with a brief introduction to the general concepts of the primer designing process Then follows instructions on how to adjust parameters for primers how to inspect and interpret primer properties graphically and how to interpret save and analyze the output of the primer design analysis After a description of the different reaction types for which primers can be designed the chapter closes with sections on how to match primers with other sequences and how to create a primer order 15 1 Primer design an introduction Primer design can be accessed in two ways select sequence Toolbox in the Menu Bar Primers and Probes E2 Design Primers z OK or right click sequence Show Primer 7x In the primer view See figure 15 1 the basic options for viewing the template sequence are the same as for the standard sequence viewer See section 11 1 for an explanation of these options Notice This means that features such as e g known SNP s or exons can be displayed on the template sequence to guide the choice of primer regions Also traces in sequencing reads can be shown along with the structure to guide e g the re sequencing of poorly resolved regions II AY738615 a Hi A S 28 AY738615 CCTTTAGTGATGGCCTGGCTCACCTGGAC a gt Lgt 19 Lgt 20 GC content Last Max 60 gt Min 40 Lgt 22 Melt temp C Max 58 Min 485 40 Inner Melt temp C M AY73861
323. saved primers including BLAST Furthermore the primers can be edited using the standard sequence viewer to introduce e g mutations and restriction sites 15 4 2 Saving PCR fragments The PCR fragment generated from the primer pair in a given table row can also be saved by selecting the row and using the right click mouse menu This opens a dialogue that allows the user to save the fragment to the desired location The fragment is saved as a DNA sequence and the position of the primers is added as annotation on the sequence The fragment can then be used for further analysis and included in e g an in silico cloning experiment using the cloning editor CHAPTER 15 PRIMERS 192 15 4 3 Adding primer binding annotation You can add an annotation to the template sequence specifying the binding site of the primer Right click the primer in the table and select Mark primer annotation on sequence 15 5 Standard PCR This mode is used to design primers for a PCR amplification of a single DNA fragment 15 5 1 User input In this mode the user must define either a Forward primer region a Reverse primer region or both These are defined using the mouse right click menu If areas are known where primers must not bind e g repeat rich areas one or more No primers here regions can be defined If two regions are defined it is required that the Forward primer region is located upstream of the Reverse primer region After exploring the availa
324. se of evolution The branches of the tree represent the amount of evolutionary divergence between two nodes in the tree and can be based on different measurements A tree is completely specified by its topology and the set of all edge lengths The phylogenetic tree in figure 19 4 is rooted at the most recent common ancestor of all Hominidae species and therefore represents a hypothesis of the direction of evolution e g that the common ancestor of gorilla chimpanzee and man existed before the common ancestor of chimpanzee and man If this information is absent trees can be drawn as unrooted 19 2 2 Modern usage of phylogenies Besides evolutionary biology and systematics the inference of phylogenies is central to other areas of research As more and more genetic diversity is being revealed through the completion of multiple genomes an active area of research within bioinformatics is the development of comparative machine learning algorithms that can simultaneously process data from multiple species Siepel and Haussler 2004 Through the comparative approach valuable evolutionary information can be obtained about which amino acid substitutions are functionally tolerant to the organism and which are not This information can be used to identify substitutions that affect protein function and stability and is of major importance to the study of proteins Knudsen and Miyamoto 2001 Knowledge of the underlying phylogeny is however paramount to com
325. sed for identifying both surface exposed regions as well as transmembrane regions depending on the window size CHAPTER 14 PROTEIN ANALYSES 178 used Short window sizes of 5 7 generally work well for predicting putative surface exposed regions Large window sizes of 19 21 are well suited for finding transmembrane domains if the values calculated are above 1 6 Kyte and Doolittle 1982 These values should be used as a rule of thumb and deviations from the rule may occur Engelman scale The Engelman hydrophobicity scale also known as the GES scale is another scale which can be used for prediction of protein hydrophobicity Engelman et al 1986 As the Kyte Doolittle scale this scale is useful for predicting transmembrane regions in proteins Eisenberg scale The Eisenberg scale is a normalized consensus hydrophobicity scale which shares many features with the other hydrophobocity scales Eisenberg et al 1984 Hopp Woods scale Hopp and Woods developed their hydrophobicity scale for identification of potentially antigenic sites in proteins This scale is basically a hydrophilic index where apolar residues have been assigned negative values Antigenic sites are likely to be predicted when using a window size of 7 Hopp and Woods 1983 Cornette scale Cornette et al computed an optimal hydrophobicity scale based on 28 published scales Cornette et al 1987 This optimized scale is also suitable for prediction of alpha helices in pr
326. sed when the objective is to design a primer pair and a probe set for TaqMan quantitative PCR e The Primer solution submenu is used to specify requirements for the match of a PCR primer against the template sequences These options are described further below It contains the following options CHAPTER 15 PRIMERS 201 255 nucleotide al 3 PERH2BD 0 GTGAGTCTGA PERH3BC O GTGAGTCTGA Consensus GTGAGTCTGA cows SGT PERH2BD O TTCCTCTAGT PERH3BC O TTCCTCTAGT Consensus TTCCTCTAGT sequence ooo TTCCTOTAGT PERH2BD O CAGAAGGAAA PERH3BC O CAGAAGGAAA Consensus CAGAAGGAAA Sequence Logo CAGAAGGAAA 100 1 PERH2BD O TCATTTAAAC PERH3BC O CAGTTTAGAT Consensus CAATTTAAAC seguento goa TTT hah 2 I TGGGTCTGCC TGGGTCTGCC TGGGTCTGCC TOGGTUTGGE TT CTGGGGTT TT CTGGGCTT TT CTGGGCTT MHUTGGCsTT a GGGGAAGAGA TGGGAAGAGA GGGGAAGAGA eGOGAAGAGH AGATGGTGTT GGAAGGTATC AGAAGGTATC Gh ToT As E Primer Designer Serting crn A CATGGTTTTC 30 CATGGTTTCC 30 CATGGTTTCC CATGGTIT C 60 l ACCTTCCTAT 60 ACCTTCCTAT ACCTTCCTAT ACCTTCCTAT TTCTAGGGAG 60 TTCTAGGGAG TTCTAGGGAG TTCTAGOGAG 120 l TGCTTATTCC 120 TGCTTGTTCC TGCTTATTCC TOCTTATTCG 120 2 g iS ii gt v Primer parameters Length Max 22 El Min 18 gt GC content Max 60 al Min 40 gt Melt temp C Max 5812 Min 48 3 Inner Melt temp C Max 622 Min
327. see figure 15 11 Y Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Mispriming parameters Use mispriming as exclusion criteria of Calculate p Help Figure 15 11 Calculation dialog for sequencing primers CHAPTER 15 PRIMERS 200 Since design of sequencing primers does not require the consideration of interactions between primer pairs this dialogue is identical to the dialogue shown in Standard PCR mode when only a single primer region is chosen See the section 15 5 for a description 15 8 1 Sequencing primers output table In this mode primers are predicted independently for each region but the optimal solutions are all presented in one table The solutions are numbered consecutively according to their position on the sequence such that the forward primer region closest to the 5 end of the molecule is designated F1 the next one F2 etc For each solution the single primer information described under Standard PCR is available in the table 15 9 Alignment based primer and probe design CLC Gene Workbench 2 0 allows the user to design PCR primers and TaqMan probes based on an alignment of multiple sequences The primer designe
328. selected from a list containing all sequences in the cloning editor Replace selection with sequence This will replace the selected region with a sequence The sequence to be inserted can be selected from a list containing all sequences in the cloning editor Cut sequence before selection This will cleave the sequence before the selection and will result in two smaller fragments CHAPTER 17 CLONING AND CUTTING 227 m z Duplicate Selection Insert Sequence After Selection Insert Sequence Before Selection Replace Selection With Sequence Delete Selection Cut Sequence Before Selection Cut Sequence After Selection Make Positive Strand Single Stranded Make Negative Strand Single Stranded Make Double Stranded Copy Selection Expand Selection Open Selection in New View Edit Selection Delete Selection Add Annotation Add Enzymes Cutting The Selection To Panel Insert Restriction Site After Selection Insert Restriction Site Before Selection Trim Sequence Left Trim Sequence Right Set Alignment Fixpoint Here Figure 17 6 Right click on a sequence selection in the cloning view e Cut sequence after selection This will cleave the sequence after the selection and will result in two smaller fragments e Copy selection This will copy the selected region to the clipboard which will enable it for use in other programs e Expand selection This will provide a dialog box in which it is possible to manually expand the selectio
329. selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the Project Tree You can perform the analysis on several DNA or several protein sequences at a time If the analysis is performed on several sequences at a time the method will search for patterns which is common between all the sequences Annotations will be added to all the sequences and a view is opened for each sequence Click Next to adjust parameters see figure 12 21 CHAPTER 12 GENERAL SEQUENCE ANALYSES 161 9 Pattern Discovery 1 Select one or more See perameters sequences of same type 2 Set parameters Set motif parameters Minimum pattern length 4 Maximum pattern length 9 Noise 1 vw Number of patterns to predict 2 v Background Distribution Include Background Distribution Background Distribution from Output options Add hits to sequence as annotations g Show results in a table L JLs _ Previous J Bnet Finish MX Cancel Figure 12 21 Setting parameters for the pattern discovery See text for details 12 7 1 Pattern discovery search parameters Various parameters can be set prior to the pattern discovery The parameters are listed below and a screen shot of the parameter settings can be seen in figure 12 21 e Minimum pattern length Here the minimum length of patter
330. sembly In this dialog you can specify how the result of the assembly the contig should be displayed e Make contig s with the reference sequence This will display a contig data object with the reference sequence at the top and the reads aligned below This option is useful when comparing sequence reads to a closely related reference sequence e g when sequencing for SNP characterization Only keep part of the reference sequence If the aligned sequence reads only cover a small part of the reference sequence it may not be desirable to include the whole reference sequence in the contig data object When selected this option lets you specify how many residues from the reference sequence that should be kept on each side of the region spanned by sequencing reads by entering the number in the Extra residues field e Make new contig s based on the reads This will produce a contig data object without the reference sequence which resembles that produced when making an ordinary assembly see section 16 3 In the assembly process the reference sequence is only used as a scaffold for alignment This option is useful when performing assembly with a reference sequence that is not closely related to the sequencing reads If there is a conflict between the reads in a given nucleotide position the program offers three ways to solve this Vote A C G T The conflict will be solved by counting instances of each nucleotide and then
331. show the sequences together in a normal sequence view Having sequences in a sequence list can help organizing sequence data The sequence list may originate from an NCBI search chapter 9 1 Moreover if a multiple sequence fasta file is imported it is possible to store the data in a sequences list A Sequence List can also be generated using a dialog which is described here select two or more sequences right click the elements New Sequence List This action opens a Sequence List dialog The dialog allows you to select more sequences to include in the list or to remove already chosen sequences from the list After clicking Next you can choose where to save the list Then click Finish Opening a Sequence list is done by right click the sequence list in the Navigation Area Show click Graphical sequence list OR click Table The two different views of the same sequence list are shown in split screen in figure 11 9 CHAPTER 11 VIEWING AND EDITING SEQUENCES 131 Create Sequence List 1 Select Sequences of Same __ SelechSequences oF seme Type Type Projects Selected Elements Default project for CLC u Ke P68046 LA Example data Su P68053 E Nucleotide We P68063 EE Protein su P68225 3D structures P68228 S E Sequences P68231 Sw CAA24102 P6s873 Me CAA32220 P68945 e NP_058652 B E Extra 9 Performed analyses E README gt Figure 11 8 A Sequence List dial
332. sible to choose how to handle the relevant sequence Copy paste from GenBank search results When using copy paste to bring the search results into the Navigation Area the actual files are downloaded from GenBank To copy paste files into the Navigation Area select one or more of the search results Ctrl C 36 C on Mac select project or folder in the Navigation Area Ctrl V Notice Search results are downloaded before they are saved Downloading and saving several files may take some time However since the process runs in the background displayed in the Status bar it is possible to continue other tasks in the program Like the search process the download process can be stopped This is done in the Toolbox in the Processes tab 9 2 Sequence web info CLC Gene Workbench 2 0 provides direct access to web based search in various databases and on the Internet using your computer s default browser You can look up a sequence in the databases of NCBI and UniProt search for a sequence on the Internet using Google and CHAPTER 9 DATABASE SEARCH 105 search for Pubmed references at NCBI This is useful for quickly obtaining updated and additional information about a sequence The functionality of these search functions depends on the information that the sequence contains You can see this information by viewing the sequence as text See section 11 3 In the following sections we will explain this in further detail The
333. sight ex goa A AG ae ew oe A PASSE INOW ose 2s cae bc ze da a pec ade ges E ome aoe a te ae tricks for the experienced user 2 ee eee es 11 11 14 15 18 20 22 22 24 CONTENTS 4 Il Basic Program Functionalities 56 3 User Interface 57 ok NAVEN ANER eto die cg da ee a a sa eee el A RA 58 312 VIOW AIGA ie ack A a a ae ee ae eee BY eee ae ee ee A 64 3 3 Zoom and selection in View Area o 69 3 4 Toolbox and Status Bar i a soi cosses a 45 da a ae a a 72 Sib WOKS ACE 5 a ae cece hts a aR a a See BP aT Sa tee a See BY a Od SC we 73 3G Ei e AE 74 4 User preferences 76 4 1 General preferences ee es 77 4 2 Default View preferences 2 0 ce ee 77 4 3 Advanced preferences osooso es 78 4 4 Export import of preferences o es 78 4 5 View preference style Sheet o es 78 5 Printing 82 5 1 Selecting which part of the view to print o eee 82 6 2 Page SEU s ss s a ee ae ASE ee ee E ee a ee EE a 83 S3 PUNDE s ie ota Ga ode ce bya ep he ta Dac de eh De oy ete ae ee ow se a 83 6 Import export of data and graphics 85 6 1 Bioinformati data formats ss so s 458 ae aaa aa ew Oe a 85 6 2 Extetiial TES sa ek m we a as we ew eR ee a RO a 90 6 3 Export graphics to TIOS s s s o asis hate eal Se a eee a ae a a 91 6 4 Copy paste View OUTDUL lt o co sa kee a doe oe we eat Se ae oe wa ea 9
334. sign lt lt 191 15 41 Saving PIMES lt a Ha we a we e Sa 191 19 4 2 Saving PCR fragments a a 4 baw Bee A 191 15 4 3 Adding primer binding annotation 2 20005 192 15 5 Standard PCR 2 6256 ei eee ce Re a eR ma s 192 15 5 4 User Input oi eae aop aokoke a Ae eh a ee oe ee e 192 15 5 2 Standard PCR OutpuLTADIG s s sa socas a ae GS A e a 194 15 6 Nested PCR 2 5 6500 ee nean a a A 195 15 6 1 Nested PCR output table is e scrisa ei eee a ea a 197 15 7 TaqMan sss be ewe opa Rw ee ee ee a a a a 197 15 7 1 TaqMan output table s ssaa ee 199 15 8 Sequencing primers 2 2 199 15 8 1 Sequencing primers output table 200 15 9 Alignment based primer and probe design lt lt 200 15 9 1 Specific options for alignment based primer and probe design 200 15 9 2 Alignment based design of PCR primers o o 201 15 9 3 Alignment based TaqMan probe design o 203 15 1 nalyze primer properties 1 we 204 15 1Match primer with Sequence lt lt 205 15 11 6Bearch for primer binding sites parameters 206 TS5 ADrder primers gt s o a ssoi reee adioa e a a a 207 184 CHAPTER 15 PRIMERS 185 CLC Gene Workbench offers graphically and algorithmically advanced design of primers and probes for variou
335. sign 5 Restriction analy EE Protein fay Extra Performed analyses E README CLC bio Home Figure 12 16 Selecting two alignments to be joined If you have selected some sequences before choosing the Toolbox action they are now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences from the Project Tree Click Next opens the dialog shown in figure 12 17 9 Join Sequences 1 Select Sequences of Same H Type 2 Set parameters Set order of concatenation top first x PERH3BC 2C PERHZBD AE e e er ee Figure 12 17 Setting the order in which sequences are joined In step 2 you can change the order in which the sequences will be joined Select a sequence and CHAPTER 12 GENERAL SEQUENCE ANALYSES 157 use the arrows to move the selected sequence up or down Click Next if you wish to adjust how to handle the results See section 8 1 If not click Finish The result is shown in figure 12 18 Bp new sequence 0 perami BWA New Sequence concatenation Figure 12 18 The result of joining sequences is a new sequence containing all the annotations of the joined sequences 12 6 Motif Search CLC Gene Workbench offers advanced and versatile options to search for unknown sequence patterns or known motifs represented either by a literal string or a regular expression These advanced search capabilities are available for use in both DNA and pr
336. stance correction measure can be used when calculating the dot plot These distance correction matrices Substitution matrices take into account the likeliness of one amino acid changing to another e Window size A residue by residue comparison window size 1 would undoubtedly result in a very noisy background due to a lot of similarities between the two sequences of interest For DNA sequences the background noise will be even more dominant as a match between only four nucleotide is very likely to happen Moreover a residue by residue comparison window size 1 can be very time consuming and computationally demanding Increasing the window size will make the dot plot more smooth Click Next if you wish to adjust how to handle the results See section 8 1 If not click Finish Y Create Dot Plot 1 Select Sequences of Same See paremeters 2 Set parameters Choose distance correction and window size Score model BLOSUM62 Y Window size 9 0 J 4 _ Previous mex Finish MX Cancel Figure 12 2 Setting the dot plot parameters 12 1 2 View dot plots A view of a dot plot can be seen in figure 12 3 You can select Zoom in 2 in the Toolbar and click the dot plot to zoom in to see the details of particular areas The Side Panel to the right let you specify the dot plot preferences The gradient color box can be adjusted to get the appropriate result by dragging the small pointers at the top of the box
337. step 18 1 you have to decide how this alignment should be treated e Redo alignment The original alignment will be realigned if this checkbox is checked Otherwise the original alignment is kept in its original form except for possible extra equally sized gaps in all sequences of the original alignment This is visualized in figure 18 5 CHAPTER 18 SEQUENCE ALIGNMENT 244 20 40 60 I i P68873 MVHLTPEEKS AVTALWGKVN VDEVGGEALG RLLVVYPWTQ RFFESFGDLS TPDAVMGNPK VKAH Q6WN20 MVHLTGEEKA AVTALWGKVN VXEVGGEALG RLLVVYPWTQ RFFESFGDLS SPDAVMSNXK VKAH P68231 MVHLSGDEKN AVHGLWSKVK VDEVGGEALG RLLVVYPWTR RFFESFGDLS TADAVMNNPK VKAH Q6H1U7 MVHLTAEEKN AITSLWGKVA EQTGGEALG RLLIVYPWTS RFFDHFGDLS NAKAVMSNPK VLAH P68945 VHWTAEEKQ LITGLWGKVN VADCGAEALA RLLIVYPWTQ RFFSSFGNLS SPTAILGNPM VRAH Consensus MVHLTAEEKN AVTALWGKVN VDEVGGEALG RLLVVYPWTQ RFFESFGDLS SPDAVMGNPK VKAH Sequence Logo MVHETSEEKsa AvTALWGKVa vsevGGEALG RLLYVYPWIs RFFesFGbLS s sAvmeNPK VRAH Conservation G 20 40 60 P68873 MVHLTPEEKS AVTALWGKV NVDEVGG EALGRLLVVY PWTQRFFESF GDLSTPDAVM GNPK Q6WN20 MVHLTGEEKA AVTALWGKV NVXEVGG EALGRLLVVY PWTQRFFESF GDLSSPDAVM SNXK P68231 MVHLSGDEKN AVHGLWSKV KVDEVGG EALGRLLVVY PWTRRFFESF GDLSTADAVM NNPK Q6H1U7 MVHLTAEEKN AITSLWGKV AILEQTGG EALGRLLIVY PWTSRFFDHF GDLSNAKAVM SNPK P68945 VHWTAEEKQ LITGLWGKV NVADCGA EALARLLIVY PWTQRFFSSF GNLSSPTAIL GNPM P68873 MVHLTPEEKS AVTALWGKVX XXXNVDEVGG EALGRLLVVY PWTQRFFESF GDLSTPDAVM GNPK Conse
338. t Translation w Trace data Show gt i uo na Y A trace trace file ACAGGCAATCTCCAAACATTGAA Y C trace Y G trace Trace data Y T trace Sh fid i C Show confidence 50 trace file GATAAGAGTACATGAGGGGTATG Trace data Trace height medium vi Scale traces J E gt G C content trace file AGGAGTTCACAATGGTGGGGAAA Search aren er Na Na tala amp gt Text format Y Figure 16 1 A sequence with trace data The preferences for viewing the trace are shown in the Side Panel 16 2 Trim sequences CLC Gene Workbench offers a number of ways to trim your sequence reads prior to assembly Trimming can be done either as a separate task before assembling or it can be performed as an integrated part of the assembly process see section 16 3 Trimming as a separate task can be done either manually or automatically In both instances trimming of a sequence does not cause data to be deleted instead both the manual and automatic trimming will put a Trim annotation on the trimmed parts as an indication to the assembly algorithm that this part of the data is to be ignored see figure 16 2 This means that the effect of different trimming schemes can easily be explored without the loss of data To remove existing trimming from a sequence simply remove its trim annotation see section 11 1 4 CHAPTER 16 ASSEMBLY 210 Trim CAGCACAGAGGTCATACTGGCATTCTGAACG Figure 16 2 Trimming creates annotations on the regions tha
339. t open the sequence in the Primer Designer Select the pBR322 sequence in the Cloning Project folder under Nucleotide in the Example data Show Primer Designer z 2 9 1 Finding the region to amplify Next make sure that the conflict annotations are shown see figure 2 21 Click Annotation types in the Side Panel Conflict gt Annotation layout Annotation types Figure 2 21 The annotations of the type conflict are now visible Find the annotation using the Search function in the Side Panel see figure 2 22 Search in the Side Panel Type conflict in the text field Check Annotation search Click Search two times in order to go to position 1891 we are not interested in the first conflict in this tutorial 2 9 2 Specifying a region for the forward primer The forward primer should be located in the region upstream from the annotation Thus select a region of approximately 40 residues and mark it Forward primer region here see figure 2 23 CHAPTER 2 TUTORIALS 42 1900 onflict TGAACAGA TCCCCCTT ACA gt Nucleotide info gt Text Format y Search conflict Sequence search Annotation search Position search a Figure 2 22 The annotations of the type conflict are now visible 1860 1880 1900 E ct ES ct AG ATCCCCCTT ACACGGAGGC ATCAGTGA Forward primer region here Reverse primer region here No primers here Copy Selection Expand Selection
340. t a file from CLC Gene Workbench 2 0 CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 91 click a file in the Navigation Area Export ES in the toolbar browse to the desired folder Save If the file already exists you are asked if you want to replace it 6 2 3 Technical details This section explains the more technical aspects of how CLC Gene Workbench 2 0 stores the external files When you import the file a copy of the file is created in a database When you open the file from the Navigation Area it s checked out to a repository a folder called CLCWorkbenchRepository located in your operating system s user folder where it stays until you close the application that has the file open When you exit CLC Gene Workbench 2 0 it checks all the files in the repository into the database unless they are still open in another application If the latter is the case the file stays in the repository even after the file is closed and it will not be checked in until the next time CLC Gene Workbench 2 0 is closed If you have made changes to a file after the CLC Gene Workbench 2 0 was closed a dialog is shown asking which version to use The date and time of the latest change of the file is displayed in the dialog helping you to decide which one to keep see figure 6 3 9 File exists The file png image file of alignment png exists in another version Do you want to use the existing file 2 Size 439338bytes Modified Wed Nov 0
341. t while a tree appears in the View Area figure 2 11 CHAPTER 2 TUTORIALS 35 45 P04443_alignm Q6WN25 Mii E Q6WN27 i tm Q6WN20 Text Format P68225 Q6WN22 Tree Layout P68945 Node symbol Dot v P68063 Layout Standard v C Show internal node labels MN Label color HB Branch label color HEB Node color HBB Line color v Annotation Layout Nodes Name Y Branches None Y Figure 2 11 After choosing which algorithm should be used the tree appears in the View Area The Side panel in the right side of the view allows you to adjust the way the tree is displayed 2 5 1 Tree layout Using the View preferences in the right side of the interface of the tree view you can edit the way the tree is displayed Click Tree Layout and open the Layout drop down menu Here you can choose between standard and topology layout The topology layout can help to give an overview of the tree if some of the branches are very short When the sequences include the appropriate annotation it is possible to choose between the accession number and the species names at the leaves of the tree Sequences downloaded from GenBank for example have this information The Annotation Layout preferences allows these different node annotations as well as different annotation on the branches The branch annotation includes the bootstrap value if this was selected when the tree was calculated It is also possible
342. t will be ignored in the assembly process 16 2 1 Manual trimming Sequence reads can be trimmed manually while inspecting their trace and quality data Trimming sequences manually corresponds to adding annotation see also section 11 1 4 but is special in the sense that trimming can only be applied to the ends of a sequence double click the sequence to trim in the Navigation Area enable trace data in the nucleotide info menu group in the preference panel select the region from where you want trimming to start right click the selection Trim sequence left right to determine the direction of the trimming This will add trimming annotation to the end of the sequence in the selected direction 16 2 2 Automatic trimming Sequence reads can be trimmed automatically based on a number of different criteria Automatic trimming is particularly useful in the following situations e f you have many sequence reads to be trimmed e f you wish to trim vector contamination from sequence reads e If you wish to ensure that the trimming is done according to the same criteria for all the sequence reads To trim sequences automatically select sequence s or sequence lists to trim Toolbox in the Menu Bar Assembly 5 Trim 2 This opens a dialog where you can alter your choice of sequences When the sequences are selected click Next This opens the dialog displayed in figure 16 3 The following parameters can be adjusted in the dialo
343. tch LEL QRQKRSINLQ QPRMATERGN Figure 12 20 Sequence view displaying the pattern found The search string was QRQXRXXXXQQ 12 6 2 Motif search output If the analysis is performed on several sequences at a time the method will search for patterns in the sequences and open a new view for each of the sequences If wanted annotations on patterns found can be added to all the sequences Each pattern found will be represented as an annotation of the type Region More information on each motif or pattern found is available through the tooltip including detailed information on the position of the pattern and how similar it was to the search string It is also possible to get a tabular view of all motifs or patterns found in either one combined table or in individual tables if multiple sequences were selected Then each pattern found will be represented with its position in the sequence and the obtained accuracy score 12 7 Pattern Discovery With CLC Gene Workbench you can perform pattern discovery on both DNA and protein sequences Advanced hidden Markov models can help to identify unknown sequence patterns across single or even multiple sequences In order to search for unknown patterns Select DNA or protein sequence s Toolbox in the Menu Bar General Sequence Analyses A Pattern Discovery E or right click DNA or protein sequence s Toolbox General Sequence Analyses A Pattern Discovery KZ If a Sequence was
344. te residues and gapS 250 18 3 4 Copy annotations to other Sequences o 250 18 3 5 Move sequences up and down ee 250 18 3 6 Delete and add sequences ee ee o 2 250 18 3 1 R align SE ISCU N lt sos s asooo e e a a 251 18 4 Join alignments 4 2 251 18 4 1 How alignments are joined 2 2 5 sra ee a gr eee ee 252 18 5 Bioinformatics explained Multiple alignments lt 253 18 5 Use of multiple alignments lt lt oo mex coser ew ROS 253 18 5 2 Constructing multiple alignments sos drs aoda mi sad 254 CLC Gene Workbench 2 0 can align nucleotides and proteins using a progressive alignment algorithm see section 18 5 or read the White paper on alignments in the Science section of http www clcbio com This chapter describes how to use the program to align sequences The chapter also describes alignment algorithms in more general terms 240 CHAPTER 18 SEQUENCE ALIGNMENT 241 18 1 Create an alignment Alignments can be created from sequences sequence lists see section 11 5 existing align ments and from any combination of the three To create an alignment in CLC Gene Workbench 2 0 select elements to align Toolbox in the Menu Bar Alignments and Trees sj Create Alignment EF or select elements to align right click either selected sequence Toolbox Alignments
345. the View Area or the Navigation Area from the search results by drag and drop copy paste or by using the right click menu Finally you can also CHAPTER 9 DATABASE SEARCH 104 Drag and drop from GenBank search results The sequences from the search results can be opened by dragging them into a position in the View Area Notice A sequence is not saved until the View displaying the sequence is closed When that happens a dialog opens Save changes of sequence x Yes or No The sequence can also be saved by dragging it into the Navigation Area It is possible to select more sequences and drag all of them into the Navigation Area at the same time Download GenBank search results using right click menu You may also select one or more sequences from the list and download using the right click menu see figure 9 2 Choosing Save sequence lets you select a folder or project where the sequences are saved when they are downloaded Choosing Open sequence opens a new view for each of the selected sequences LMASTLUSE LABLTIUZ DETS dIO JF AD Se 05 coc U 1 I I CAA24102 AAB59637 min File d it CAA24102 CAA32220 lhag Edit 1 CAA24102 BAB28280 lunn View gt i CAA24102 CAA45517 emt odie 2 CAA24102 CAA45518 lemi 1 CAA24102 C4432221 hae Show 1 COABAMGASOO IDAERENAA mee 2 NCBI 14 NCBI Open se Open sequence nce Ope Save sequence sequence Figure 9 2 By right clicking a search result it is pos
346. the View preferences floating See next section 4 5 1 Floating Side Panel The Side Panel of the views can be placed in the right side of a view or they can be floating See figure 4 5 CHAPTER 4 USER PREFERENCES 80 qu qs E 7 E Sequence layout C Spaces every 10 residues O No wrap Auto wrap O Fixed wrap C Double stranded Numbers on sequences Relative to Numbers on plus strand Follow selection Lock numbers Lock labels Sequence label Name Annotation layout gt Annotation types gt Restriction sites gt Residue coloring gt Nucleotide info gt Search Text Format Figure 4 3 The many preferences for each view are stored in preference groups which can be opened and closed Figure 4 4 The top of the View preferences contain Expand all preferences Collapse all preferences Dock Undock preferences Help and Save Restore preferences By clicking the Dock icon 3 the floating Side Panel reappear in the right side of the view The size of the floating Side Panel can be adjusted by dragging the hatched area in the bottom right CHAPTER 4 USER PREFERENCES 81 ES sequence list Sequence list sequence list Number of rows 5 Name Accession Definition Modificati Length PERHIBA M15292 P maniculat 27 APR 1993 110 PERHIBB M15289 P maniculat 27 APR 1993 110 PERH284 M15293 P manicu
347. the sequence data of the OTUs Maximum likelihood inference Felsenstein 1981 then consists of finding the tree which assign the highest probability to the data Bayesian inference The objective of Bayesian phylogenetic inference is not to infer a single correct phylogeny but rather to obtain the full posterior probability distribution of all possible phylogenies This is obtained by combining the likelinood and the prior probability distribution of evolutionary parameters The vast number of possible trees means that bayesian phylogenetics must be performed by approximative Monte Carlo based methods Larget and Simon 1999 Yang and Rannala 1997 19 2 4 Interpreting phylogenies Bootstrap values A popular way of evaluating the reliability of an inferred phylogenetic tree is bootstrap analysis CHAPTER 19 PHYLOGENETIC TREES 263 The first step in a bootstrap analysis is to re sample the alignment columns with replacement l e in the re sampled alignment a given column in the original alignment may occur two or more times while some columns may not be represented in the new alignment at all The re sampled alignment represents an estimate of how a different set of sequences from the same genes and the same species may have evolved on the same tree If a new tree reconstruction on the re sampled alignment results in a tree similar to the original one this increases the confidence in the original tree If on the other hand the ne
348. the sequences are nucleotide or protein sequences e FormatDB Option Enables or disables parsing of Seqid and creation of indeces e Input Sequences Depending on the choice of Select Input Source above clicking the button will let you browse the Navigation Area or the external file system for the sequences CHAPTER 10 BLAST SEARCH 116 Y Create BLAST Database 1 Set parameters For local BLAST database Select Input Source External FASTA file Navigation Area Sequence type Input Sequences Save BLAST database Figure 10 10 Setting parameters for the local BLAST database which you want to include in the database e Save BLAST database Lets you browse your external file system for a suitable place to save the database After having adjusted all these settings click Next which opens the dialog seen in figure 10 11 9 Create BLAST Database BLAST database 2 Save to project 2 Nucleotide 8 7 Protein Hj Extra 5 Performed analyses sof README CLC bio Home Alignments g Name blast database a erm gt Figure 10 11 Choose where the access point to your local BLAST database is saved in the Navigation Area Click Next to complete the creation of the database Chapter 11 Viewing and editing sequences Contents AT 1 View SEQUENCE lt soc ossos or soior ee ROR ee aUa 117 11 1 1 Sequence Layout in Side Panel aaoo ee ee es 118 11 1 2 Selecting parts of the
349. ting a sequence The Create Sequence dialog figure 11 7 reflects the information needed in the GenBank format but you are free to enter anything into the fields The following description is a guideline for entering information about a sequence CHAPTER 11 VIEWING AND EDITING SEQUENCES 130 e Name The name of the sequence This is used for saving the sequence e Common name A common name for the species e Species The Latin name e Type Select between DNA RNA and protein e Circular Specifies whether the sequence is circular This will open the Sequence in a circular view as default applies only to nucleotide sequences e Description A description of the sequence e Keywords A set of keywords separated by semicolons e Comments Your own comments to the sequence e Sequence Depending on the type chosen this field accepts nucleotides or amino acids Spaces and numbers can be entered but they are ignored when the sequence is created This allows you to paste in a sequence directly from a different source even if the residue numbers are included Characters that are not part of the IUPAC codes cannot be entered At the top right corner of the field the number of residues are counted The counter does not count spaces or numbers Clicking Next will allow you to save the sequence to a project in the Navigation Area 11 5 Sequence Lists The Sequence List shows a number of sequences in a tabular format or it can
350. ting horisontally may be done this way right click a tab of the View View Split Horizontally This action opens the chosen View below the existing View See figure 3 11 When the split is made vertically the new View opens to the right of the existing View Splitting the View Area can be undone by dragging e g the tab of the bottom view to the tab of the top view This is marked by a gray area on the top of the view Maximize Restore size of View The Maximize Restore View function allows you to see a View in maximized mode meaning a mode where no other Views nor the Navigation Area is shown Maximizing a View can be done in the following ways select View Ctrl M or select View View Maximize restore size of View _ or select View right click the tab View Maximize restore View 7 or double click the tab of View CHAPTER 3 USER INTERFACE 68 ee PERH1BD PERH2BD AY268131 AY738615 es PERH1BB PERH2BB PERH3BA 8 HUMDINUC PERH1BA 28 4 PERH2BA AF134224 100 AJ871593 Figure 3 9 When dragging a View a gray area indicates where the View will be shown The following restores the size of the View Ctrl M or View Maximize restore size of View 7 or click close button 3 in the corner of the View Area or double click title of View 3 2 6 Side Panel The Side Panel allows you to change the way the contents of a view are displayed The options in the Side Panel depend on the kind o
351. tion can be done by clicking the colored square next to the relevant annotation type Many different settings can be set in the three layers Swatches HSB and RGB Apply your settings and click OK When you click OK the color settings cannot be reset The Reset function only works for changes made before pressing OK Restriction sites These preferences allow you to display restriction sites on the sequence There is a list of enzymes which are represented by different colors By selecting or deselecting the enzymes in the list you can specify which enzymes restriction sites should be displayed see figure 17 4 160 v Restriction sites CACACACA CGACCACACTGCATCTGCAGAACCG Show GTGTGTGTCAGCTIGGTGTGACGTA CGTCTTGGC Done MA sti ceca E M salt GTCGAC Figure 11 1 Showing restriction sites of two restriction enzymes The color of the flag of the restriction site can be changed by clicking the colored box next to the enzyme s name The list of restriction enzymes contains per default ten of the most popular enzymes but you can easily modify this list and add more enzymes You have four ways of modifying the list e Edit enzymes button This displays a dialog with the enzymes currently in the list shown at the bottom and a list of available enzymes at the top To add more enzymes select them in the upper list and press the Add enzymes button Jh To remove enzymes select them in the list below and click the Remove
352. to annotate the branches with their lengths 2 6 Tutorial Detect restriction sites This tutorial will show you how to find restriction sites and annotate them on a sequence Suppose you are working with sequence PERH3BC from the example data can be downloaded from http www clcbio com download and you wish to know which restriction enzymes will cut this sequence exactly once and create a 3 overhang Do the following select the PERH3BC sequence from the Primer design folder Toolbox in the Menu Bar Cloning and Restriction Sites 9 Restriction sites of The dialog shown in fig 2 12 opens and you can confirm or change your selection of input sequence CHAPTER 2 TUTORIALS 36 9 Find Restriction Sites 1 Select DNA sequences MEA Projects Selected Elements LO Example data 6 _ PERH3BC E Nucleotide S E Sequences 20 PERH2BD 20 HUMDINUC iF sequence list B E Assembly 8 3 Cloning project P ES Protein eL Extra Performed analyses E README E CLC bio Home Figure 2 12 Choosing sequence PERH3EC In the next step you uncheck Blunt ends and 5 overhang since we only wish to use enzymes with a 3 overhang Then click Select all see figure 2 13 Y Find Restriction Sites 1 Select DNA sequences MIA 2 Filter enzymes Choose from enzyme set All available y Only include enzymes which have Minimum recognition sequence length D Bint ends 3 ove
353. to the cloning vector The sequence inserted is selected by default 17 2 6 Show in a circular view The sequences stored in the cloning view can be saved to a sequence list and later be opened again for further editing A sequence list is represented by the following icon in the navigation area After finishing the in silico cloning in a linear mode the newly formed cloning vector or plasmid can easily be visualized in circular mode Simply verify that the molecule is circular and right click the sequence name and press open sequence in circular view Then you have a circular view as displayed in figure 17 10 SYNPBR322 4532 bp Replication a Jo B Enhe Dalgarno sequence Figure 17 10 Final circular view of the plasmid The tetracycline gene is disrupted by an insert 17 2 7 Real cloning example This will show you very briefly how to insert a gene into a vector with only a few mouse clicks We want to insert a gene Human beta globin 2 HBG2 into the commonly known pBR322 plasmid We choose to insert our gene of interest into the tetracyclin resistance gene of pBR322 which will enable us to select for tetracyclic sensitive clones Select the pBR322 and DNA sequence HUMHBB holding the gene of interest in the navigator Cut out the gene of interest simply by double clicking the gene HBG2 right click with the mouse on the selected region on the sequence and click duplicate sequence Then the selected region is dup
354. to the tetracycline gene Open the sequence in a circular view and see that the tetracycline gene is disrupted by an insert of the HBG2 gene This very short walk through show some of the powerful cloning capabilities which is included in CLC Gene Workbench 17 3 Restriction site analysis This section explains how to adjust the detection parameters and offers basic information with respect to restriction site algorithms 17 3 1 Restriction site parameters Given a DNA sequence CLC Gene Workbench 2 0 detects restriction sites in accordance with detection parameters and shows the detected sites as annotations on the sequence or in textual format in a table To detect restriction sites select sequence Toolbox in the Menu Bar Cloning and Restriction Sites Restriction sites of or right click sequence Toolbox Cloning and Restriction Sites 3 Restriction sites off The result of these steps can be seen in figure 17 12 CHAPTER 17 CLONING AND CUTTING 231 9 Find Restriction Sites 1 Select DNA sequences MIE Sl is AAA Projects Selected Elements LO Example data 6 _ PERH3BC E Nucleotide ea Sequences 20 PERH2BD 206 HUMDINUC iF sequence list a Assembly 6 5 Cloning project E Primer design 4 53 Restriction analysis H E Protein a Extra f Performed analyses E README E CLC bio Home Figure 17 12 Choosing sequence PERH3BC If a sequence was selected
355. triction enzyme database 234 Recycle Bin 62 Redo alignment 243 Redo Undo 66 Reference sequence 265 References 272 Region syntax 125 types 126 Remove annotations 126 sequences from alignment 250 terminated processes 72 Rename element 62 Replace file 91 Report program errors 19 Report protein 265 Request new feature 19 Reset license 17 18 Residue coloring 121 Restore deleted elements 62 size of view 68 Restriction enzymes 230 separate on gel 237 Restriction sites 230 265 enzyme database Rebase 234 circular DNA 133 on sequence 120 parameters 230 tutorial 35 Results handling 97 Reverse complement 165 265 Reverse translation 179 265 Bioinformatics explained 181 RNA translation 166 Safe mode 19 Save changes in a view 66 search 32 sequence 32 style sheet 78 view preferences 78 workspace 73 SCF2 file format 29 86 270 SCF3 file format 29 86 270 Score BLAST search 111 Scoring matrices Bioinformatics explained 142 BLOSUM 142 PAM 142 Search BLAST 107 GenBank 101 handle results from GenBank 103 hits number of 77 in a sequence 122 in annotations 122 Local BLAST 113 options GenBank 101 parameters 102 patterns 157 160 PubMed references 105 sequence in UniProt 106 sequence on Google 105 sequence on NCBI 105 sequence on web 104 Secondary structure prediction 265 Secondary structure for primers 188 Select exact positions 122 in sequen
356. trl W a W Close all views Ctrl Shift W d Shift W Copy Ctrl C C Cut Ctrl X a X Delete Delete Delete Exit Alt F4 Q Export Ctrl E E Export graphics Ctrl G G Find Inconsistency Space Space Find Previous Inconsistency y Help F1 F1 Import Ctrl Maximize restore size of View Ctrl M M Move gaps in alignment Navigate sequence views New Folder New Project New Sequence View Paste Print Redo Rename Save Search in an open sequence Search NCBI Search UniProt Select All Selection Mode User Preferences Split Horizontally Split Vertically Show hide Preferences Undo Zoom In Mode Zoom In without clicking Zoom Out Mode Zoom Out without clicking Ctrl arrow keys left right arrow keys Ctrl Shift N Ctrl R Ctrl N Ctrl O Ctrl V Ctrl P Ctrl Y F2 Ctrl S Ctrl F Ctrl B Ctrl Shift U Ctrl A Ctrl 2 Ctrl K Ctrl T Ctrl J Ctrl U Ctrl Z Ctrl plus plus Ctrl minus minus 3 arrow keys left right arrow keys 8 Shift N R N 0 ae V P Y F2 S F B a Shift U a A 2 T J U Z plus plus 8 minus minus Combinations of keys and mouse movements are listed below Action Windows Linux MacOS X Mouse movement Maximize View Restore View Reverse zoom function Shift Select
357. ttle offset The annotations are piled on top of each other but they have been offset a little More offset Same as above but with more spreading Most offset The annotations are placed above each other with a little space between This can take up a lot of Space on the screen e Label Each annotation can be labelled with a name Additional information about the sequence is shown if you place the mouse cursor on the annotation and keep it still No labels No labels are displayed On annotation The labels are displayed in the annotation s box Over annotation The labels are diplayed above the annotations Before annotation The labels are placed just to the left of the annotation Flag The labels are displayed as flags at the beginning of the annotation e Show arrows Toggles the display of arrow heads on the annotations e Use gradients Fills the boxes with gradient color CHAPTER 11 VIEWING AND EDITING SEQUENCES 120 Annotation types e Annotation types This group lists all the types of annotations that are attached to the sequence that is viewed For sequences with many annotations it can be easier to get an overview if you deselect the annotation types that are not relevant If you want to remove single annotations while preserving other annotations of the same type see section 11 1 4 It is possible to color the different annotations for better overview Color settings for an annota
358. ture is not part of the algorithm CHAPTER 13 NUCLEOTIDE ANALYSES 170 NC_000913 selection ORF x l l NC_000913 selection T m yaa Al ORE l NC_000913 selection Figure 13 9 The first 12 000 positions of the E coli sequence NC_000913 downloaded from GenBank The blue dark annotations are the genes while the yellow brighter annotations are the ORFs with a length of at least 100 amino acids On the positive strand around position 11 000 a gene starts before the ORF This is due to the use of the standard genetic code rather than the bacterial code This particular gene starts with CTG which is a start codon in bacteria Two short genes are entirely missing while a handful of open reading frames do not correspond to any of the annotated genes Chapter 14 Protein analyses Contents 14 1 Protein Charge lt c0 lt 2 uu ee a ee a 171 14 1 1 Modifying the layQUt lt lt s i ecni sosok d o r e 172 14 2 Hydrophobicity aaas a a ae a a ee 174 14 21 HydropDhobiciy DIOU o s s s aca aoa wk a ee AA 174 14 2 2 Hydrophobicity graphs along sequence 1 ee ee 176 14 2 3 Bioinformatics explained Protein hydrophobicity 177 14 3 Reverse translation from protein into DNA 2 2 2882822 e ee 179 14 3 1 Reverse translation parameters o 179 14 3 2 Bioinformatics explained Reverse translation 181 CLC Gene Workbench 2 0 offers analyses
359. tween 70 and 80 inclusive and ending at 90 inclusive Region 6 A range of residues beginning somewhere between 100 and 110 inclusive and ending somewhere between 120 and 130 inclusive Region 7 A site between residues 140 and 141 Region 8 A site between two residues somewhere between 150 and 160 inclusive Region 9 A region that covers ranges from 170 to 180 inclusive and 190 to 200 inclusive Region 10 A region on negative strand that covers ranges from 210 to 220 inclusive Region 11 A region on negative strand that covers ranges from 230 to 240 inclusive and 250 to 260 inclusive 11 2 Sequence information The normal view of a sequence by double clicking shows the annotations as boxes along the sequence but often there is more information available about sequences This information is available through the Sequence info function which also displays a textual overview of the annotations To view the sequence information select a sequence in the Navigation Area Show in the Toolbar Sequence info This will display a view similar to fig 11 5 All the lines in the view are headings and the corresponding text can be shown by clicking the text The information available depends on the origin of the sequence If the sequence is annotated the annotations can be found under the heading Annotation map CHAPTER 11 VIEWING AND EDITING SEQUENCES 128 906 HUMHBB Description Comments gt KeyWords
360. u can import them into a project in CLC Gene Workbench 2 0 Importing an external file creates a copy of the file which is saved in a project in CLC Gene Workbench 2 0 The file can now be opened by double clicking the file name in the Navigation Area The file is opened using the default application for this file type e g Microsoft Word for doc files and Adobe Reader for pdf CLC Gene Workbench can also show web links URLs in the Navigation Area This can be done by using the Import function of the program or by dragging the file e g from the desktop to the Navigation Area 6 2 1 Import external files To import an external file click a project or folder to import into Import 5 in the toolbar Choose All files in Files of type browse to the relevant file Select or drag the file from the file system into a project in the Navigation Area only possible under Windows Notice When you import an external file a copy of the original file is created This means that you should always make sure that you open the file from within CLC Gene Workbench 2 0 6 2 2 Export external files If you export an entire project or folder from CLC Gene Workbench 2 0 the exported CLC file will include all external files stored in it This means that you can export the project as a CLC file and send it to a colleague who can import it and access all the files in the project You can also export individual files in their original format To expor
361. ulating the extinction coefficient The extinction coefficient is calculated from the absorbance of cysteine tyrosine and tryptophan using the following equation Ext Protein count C ystine xExt Cystine count Tyr xExt Tyr count Trp xExt Trp 12 2 where Ext is the extinction coefficient of amino acid in question At 280nm the extinction coefficients are Cys 120 Tyr 1280 and Trp 5690 This equation is only valid under the following conditions CHAPTER 12 GENERAL SEQUENCE ANALYSES 154 Amino acid Mammalian Yeast E coli Ala A 4 4 hour gt 20 hours gt 10 hours Cys C 1 2 hours gt 20 hours gt 10 hours Asp D 1 1 hours 3 min gt 10 hours Glu E 1 hour 30 min gt 10 hours Phe F 1 1 hours 3 min 2 min Gly G 30 hours gt 20 hours gt 10 hours His H 3 5 hours 10 min gt 10 hours lle I 20 hours 30 min gt 10 hours Lys K 1 3 hours 3 min 2 min Leu L 5 5 hours 3 min 2 min Met M 30 hours gt 20 hours gt 10 hours Asn N 1 4 hours 3 min gt 10 hours Pro P gt 20 hours gt 20 hours 2 Gin Q 0 8 hour 10 min gt 10 hours Arg R 1 hour 2 min 2 min Ser S 1 9 hours gt 20 hours gt 10 hours Thr T 7 2 hours gt 20 hours gt 10 hours Val V 100 hours gt 20 hours gt 10 hours Trp W 2 8 hours 3 min 2 min Tyr Y 2 8 hours 10 min 2 min Table 12 2 Estimated half life Half life of proteins where the N terminal residue is listed in the first column and the half life in the subsequent columns for mammals
362. ur different ways of doing gel electrophoresis e Cut with selected enzymes and run in one lane per sequence If you have selected more than one sequence this option will display one lane per sequence in the same way as the first option e Cut with selected enzymes and run in one lane per sequence and per enzyme This will display a number of lanes equalling the number of selected sequences multiplied by the number of selected enzymes thus combining the functionality of option number two and three For more information about gel electrophoresis see section 17 5 In order to complete the analysis click Finish The result is shown in figure 17 16 Choosing the textual output option will open a new view containing a table with an overview of restriction sites Choosing the graphical output option will add restriction site annotations to the selected sequence If too many restriction sites are found a dialog will ask if you want to proceed or show the restriction sites only in a table format Showing too many restriction sites as annotations on the sequence will take up a lot of your computer s processing power Notice The text is not automatically saved To save the result Right click the tab File Save 5 The textual output mentioned above will list all the cut positions where the sequence is restricted CHAPTER 17 CLONING AND CUTTING 234 PERHSEC PERH3BC GTGAGTCTGA TGGGTCTGCC CATGGTTTCC TTCCTCTAGT TTCTG a Mboll
363. us are described in the following Manipulate the whole sequence Right clicking the sequence name at the left side of the view reveals several options on sorting opening and editing the sequences in the view see figure 17 5 Open Sequence in Circular view sequence D Duplicate Sequence Insert Another Sequence After This Sequence Insert Another Sequence Before This Sequence Reverse Complement Sequence Digest Sequence with Selected Enzymes and Run on Gel Rename Sequence Select Sequence Delete Sequence Open Copy of Sequence in New View Open This Sequence in New View Make Sequence Linear Sort Sequence List by Name Sort Sequence List by Length Figure 17 5 Right click on the sequence in the cloning view e Insert sequence after this sequence Insert another sequence after this sequence The sequence to be inserted can be selected from a list which contains the sequences which are present in the cloning editor e Insert sequence before this sequence Insert another sequence before this sequence The sequence to be inserted can be selected from a list which contains the sequences which are present in the cloning editor e Duplicate sequence Adds a duplicate of the selected sequence The new sequence will be added to the list of sequences shown on the screen CHAPTER 17 CLONING AND CUTTING 226 Reverse complement sequence Creates the reverse complement of a sequence and replaces the original sequence in the list Ma
364. use right click on the primers information point This does not allow for any design information to enter concerning the properties of primer probe pairs or sets e g primer pair annealing and Tm difference between primers If the latter is desired the user can use the calculate button at the bottom of the Primer parameter preference group This will activate a dialog the contents of which depends on the chosen mode Here the user can set primer pair specific setting such as allowed or desired Tm difference and view the single primer parameters which were chosen in the Primer parameters preference group Upon pressing finish an algorithm will generate all possible primer sets and rank these based on their characteristics and the chosen parameters A list will appear displaying the 100 most CHAPTER 15 PRIMERS 187 high scoring sets and information pertaining to these The search result can be saved to the navigator From the result table suggested primers or primer probe sets can be explored since clicking an entry in the table will highlight the associated primers and probes on the sequence It is also possible to save individual primers or sets from the table through the mouse right click menu For a given primer pair the amplified PCR fragment can also be opened or saved using the mouse right click menu 15 2 Setting parameters for primers and probes The primer specific view options and settings are found in the Primer parameters preferen
365. verse primer in a primer pair e Pair annealing alignment a visualization of the optimal alignment of the forward and the reverse primer in a primer pair e Pair end annealing the maximum score of consecutive end base pairings found between the ends of the two primers in the primer pair in units of hydrogen bonds e Fragment length the length number of nucleotides of the PCR fragment generated by the primer pair 15 6 Nested PCR Nested PCR is a modification of Standard PCR aimed at reducing product contamination due to the amplification of unintended primer binding sites mispriming If the intended fragment can not be amplified without interference from competing binding sites the idea is to seek out a larger outer fragment which can be unambiguously amplified and which contains the smaller intended fragment Having amplified the outer fragment to large numbers the PCR amplification of the inner fragment can proceed and will yield amplification of this with minimal contamination Primer design for nested PCR thus involves designing two primer pairs one for the outer fragment and one for the inner fragment In Nested PCR mode the user must thus define four regions a Forward primer region the outer forward primer a Reverse primer region the outer reverse primer a Forward inner primer region and a Reverse inner primer region These are defined using the mouse right click menu If areas are known where primers must not bind e g
366. w Show Hide Side Panel Ctrl U Toolbox Show Yv v PX Close Ctrl Ww Je Find in Project Ctrl Shift F B Close Tab Area C Maximize Restore View Ctrl M Ty Close All Views Ctrl Shift w ol Fit width Zoom to 100 Figure 2 36 This will select the sequence in the Navigation Area You can also use the shortcut key Ctrl Shift F on Windows or 88 Shift F on Mac 2 11 3 Find specific annotations on a sequence If you are looking for a specific annotation on a sequence you may benefit from viewing the Sequence info while keeping an ordinary view of the sequence on the screen In the Sequence info you find an Annotation map which displays all the annotations of the sequence The annotations serve as links selecting the annotation in the ordinary view of the sequence see figure 2 37 ys AA y HUMHBB Annotatio Name Position HBB thalassemia join 62187 62 join 19541 HBG2 join 34531 34 Qe HEGI join 39467 39 13946 7 cos _ join 45710 45 45709 ao 0 ints 4790 54 54789 HBB join 62187 62 62186 Gene Conflict Conflict 37486 37485 Exon Exon 1 lt 45710 45800 45709 Old sequence Exon Exon 1 lt 62187 62278 62186 Exon Exon 2 62390 lt 62408 62389 o Precursor RNA Exon Exon 1 34478 34622 134477 Exon Exon 1 39414 39558 139413 Exon Exon 3 46997 47124 46996 Repeatregion Exon Exon 1 54740 54881
367. w You can choose a minimum length of the recognition sequence and you can choose whether to include enzymes with Blunt ends 3 overhang and or 5 overhang Having adjusted the parameters in Choose from enzyme set and Only include enzymes which have the total list of enzymes is shown in the table The enzymes can be sorted by clicking the column headings and you can select which enzymes to include in the search be inserting CHAPTER 17 CLONING AND CUTTING 232 removing check marks next to the enzymes Clicking Next confirms the list of enzymes which will be included in the analysis and takes you to Step 3 In Step 3 you can limit which enzymes cut sites should be included in the output See figure 17 14 Y Find Restriction Sites 1 Select DNA sequences 2 Filter enzymes 3 Set exclusion criteria and output options Exclude enzymes based on number of matches Exclude enzymes with less matches than Exclude enzymes with more matches than Output options Y Create output as annotations on sequence Create tabular output Oc om selected enzymes which Fulfill match number criteria Figure 17 14 Exclusion criteria and output options The default setting Exclude enzymes with less than 1 matches means that enzymes which do not match at all are not included in the output If e g you only want to see enzymes which match exactly once you can check the Exclude enzymes with more than 1
368. w tree looks very different it means that the inferred tree is unreliable By re sampling a number of times it is possibly to put reliability weights on each internal branch of the inferred tree If the data was bootstrapped a 100 times a bootstrap score of 100 means that the corresponding branch occurs in all 100 trees made from re sampled alignments Thus a high bootstrap score is a sign of greater reliability Other useful resources The Tree of Life web project http tolweb org Joseph Felsensteins list of phylogeny software http evolution genetics washington edu phylip software html Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in it s original form and CLC bio has to be clearly labelled as author and provider of the work You may not use this work for commercial purposes You may not alter transform or build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more about how you may use the contents Part IV Appendix Appendix A Comparison of workbenches Below we list a number of functionalities that differ between CLC Workbenches e CLC Free Workbench m e CLC Protein Workbench m e CLC Gene Workbench m e CLC C
369. wer Edit Annotation Remove Annotation Translate CDS ORF Remove Annotations of This Type Remove All Annotations Set Numbers Relative to This Annotation Figure 2 39 Opening the coding region in a new view 2 11 6 Translate a coding region If you have a genomic sequence containing one or more coding regions you can translate these regions in a quick an easy way If you want to translate a single coding region see figure2 40 right click the coding region s annotation Translate CDS ORF This will open a new view with the translated sequence In order to translate all the coding regions of a sequence Toolbox Nucleotide Analyses A Translate to protein 2 Translate CDS and ORF in Step 2 CHAPTER 2 TUTORIALS 51 This will extract all the coding regions of the sequence and for each region it will open a new view with the translation HB Select Annotation Open Annotation in New Viewer Edit Annotation Remove Annotation Translate CDS ORF Remove Annotations of This Type Remove All Annotations Set Numbers Relative to This Annotation Figure 2 40 Opening a new view with the translation of the coding region 2 11 7 Copy annotations from one sequence to another If you have a collection of similar sequences and you have annotated one of the sequences you can copy these annotations to the rest of the sequences First create an alignment of the sequences Next find the annotated sequence and for each of t
370. with lower case This allows you to customize what is filtered from the sequence during the comparison to the BLAST databases e Expect The statistical significance threshold for reporting matches against database sequences the default value is 10 meaning that 10 matches are expected to be found merely by chance according to the stochastic model of Karlin and Altschul 1990 If the statistical significance ascribed to a match is greater than the EXPECT threshold the match will not be reported Lower EXPECT thresholds are more stringent leading to fewer chance matches being reported Increasing the threshold shows less stringent matches Fractional values are acceptable e Word Size BLAST is a heuristic that works by finding word matches between the query and database sequences You may think of this process as finding hot spots that BLAST can then use to initiate extensions that might lead to full blown alignments For nucleotide nucleotide searches i e BLASTn an exact match of the entire word is required before an extension is initiated so that you normally regulate the sensitivity and speed of the search by increasing or decreasing the wordsize For other BLAST searches non exact word matches are taken into account based upon the similarity between words The amount of similarity can be varied so that you normally uses just the wordsizes 2 and 3 for these searches e Matrix A key element in evaluating the quality of a pairwise sequence
371. wo A4 pages This is illustrated in the Page Layout field Click the Header Footer tab to edit the header and footer text By clicking in the text field for either Custom header text or Custom footer text you can access the auto formats for header footer text in Insert a caret position Click either Date View name or User name to include the auto format in the header footer text Click OK to see the print preview with the settings you have made 5 3 Print preview The preview is shown in figure 5 3 The Print preview window lets you see the layout of the pages that are printed Use the arrows in the toolbar to navigate between the pages Click Print 44 to show the print dialog which lets you choose e g which pages to print Notice that if you wish to change e g the colors of the residues in the alignment this must be changed in the View preferences of the specific dot plot CHAPTER 5 PRINTING 84 9 Preview CLC Combined Workbench 2 0 Eile View 8 YOUVE ak y pe a aia i AL fala il MATA Page 1 of 1 Figure 5 3 Print preview Chapter 6 Import export of data and graphics Contents 6 1 Bioinformatic data formats 1 lt lt ee 85 6 1 1 Import of bioinformatic data a s a s as sae sea aaa saa 86 6 1 2 Export or biointormatic data os sicario 804854 2 we Ss 88 62 EXtemalfiles ci sas aie ee ea ie al aa Se ee aw 90 6 2 1 Import extern
372. xcel There is a huge number of programs in which the copy paste can be applied For simplicity we include one example of the copy paste function from a Folder Content view to Microsoft Excel First step is to select the desired elements in the view click a line in the Folder Content view hold Shift button Push arrow down or up See figure 6 6 5 Sequences 3 Type Name Description Database Xc INM_000044 Homo sapiens androgen receptor dihydro Local xc AY738615 Homo sapiens hemoglobin delta beta fusio Local c HUMDINUC Human dinucleotide repeat polymorphism Local x PERH2BD P maniculatus deer mouse beta 2 globin lLocal Xc PERH3BC P maniculatus deer mouse beta 3 globin Local sequence list Local Figure 6 6 Selected elements in a Folder Content view When the elements are selected do the following to copy the selected elements right click one of the selected elements Edit Copy Then CHAPTER 6 IMPORT EXPORT OF DATA AND GRAPHICS 94 right click in the cell A1 Paste 74 The outcome might appear unorganized but with a few operations the structure of the view in CLC Gene Workbench 2 0 can be produced Except the icons which are replaced by file references in Excel Chapter 7 History Contents T Element history ic 02 lt lt 62 a eee ee a ee ee 95 TL Sharing data with history a soer wo ee mara a ao e a a 96 CLC Gene Workbench 2 0 keeps a log of al
373. xity va Low complaxity Y Ge a a Sty a a Sp Hh HG A SH Figure 12 10 The dot plot showing a low complexity region in the sequence The sequence is artificial and low complexity regions does not always show as a square want to find distant related proteins to a sequence of interest using BLAST you could benefit of using BLOSUMA45 or similar matrices d gt PAM1 PAM120 PAM250 BLOSUM80 BLOSUM62 BLOSUM45 a Less divergent More divergent E y Figure 12 11 Relationship between scoring matrices The BLOSUM62 has become a de facto standard scoring matrix for a wide range of alignment programs It is the default matrix in BLAST Other useful resources Calculate your own PAM matrix http www bioinformatics nl tools pam html BLOKS database http plocks fhere org NCBI help site http www ncbi nlm nih gov Education BLASTinfo Scoring2 html CHAPTER 12 GENERAL SEQUENCE ANALYSES 146 A R N D C Q E G H L K M F P S T W Y V A 4 1 2 2 0 1 1 Oo 2 1 4 4 1 2 1 1 0 3 2 0 R 1 5 oOo 2 3 1 0 2 0 3 2 2 2 3 2 44 1 3 2 3 N 2 0 6 1 3 0 0 0 1 3 3 O 2 3 2 1 0 4 2 23 D 2 2 1 6 3 0 2 1 1 3 4 14 3 3 1 O 1 4 3 3 C 0 3 3 3 9 3 4 3 3 1 4 3 1 2 3 1 1 2 2 1 Q 1 1 0 o 3 5 2 2 0 3 2 1 0 3 1 O 1 2 1 2 E 1 0 0 2 4 2 5 2 0 3 3 129 3 41 Oo 1 3 2 2 G 0 2 004 3 2 2 6 2 4 4 2 3 3 2 Oo 2 2 3 3 H 2 0 1 1 3 0 0 2 8 3 3 4 2 1 2 1 2 2 2 3 1 3 3 3 1 3 3 4 3 4 2 3 1 Oo 3 2 1 3 1 3 L 1 2
374. xpoints to be aligned to each other and their second fixpoints will also be aligned to each other Advanced use of fixpoints Fixpoints with the same names will be aligned to each other which gives the opportunity for great control over the alignment process It is only necessary to change any fixpoint names in very special cases One example would be three sequences A B and C where sequences A and B has one copy of a domain while sequence C has two copies of the domain You can now force sequence A to align to the first copy and sequence B to align to the second copy of the domains in sequence C This is done by inserting fixpoints in sequence C for each domain and naming them fp1 and fp2 for example Now you can insert a fixpoint in each of sequences A and B naming them fp1 and fp2 respectively Now when aligning the three sequences using fixpoints sequence A will align to the first copy of the domain in sequence C while sequence B would align to the second copy of the domain in sequence C You can name fixpoints by right click the Fixpoint annotation Edit Annotation type the name in the Name field CHAPTER 18 SEQUENCE ALIGNMENT 246 18 2 View alignments Since an alignment is a display of several sequences arranged in rows the basic options for viewing alignments are the same as for viewing sequences Therefore we refer to section 11 1 for an explanation of these basic options However the
375. y syndrome virus at more than 10 years before the emergence of disease Virology 289 2 174 179 272 BIBLIOGRAPHY 273 Gill and von Hippel 1989 Gill S C and von Hippel P H 1989 Calculation of protein extinction coefficients from amino acid sequence data Anal Biochem 182 2 319 326 Gonda et al 1989 Gonda D K Bachmair A W nning l Tobias J W Lane W S and Varshavsky A 1989 Universality and structure of the N end rule J Biol Chem 264 28 16700 16712 Hein 2001 Hein J 2001 An algorithm for statistical alignment of sequences related by a binary tree Pacific symposium on biocomputing page 179 Hein et al 2000 Hein J Wiuf C Knudsen B Mgller M B and Wibling G 2000 Statistical alignment computational properties homology testing and goodness of fit J Mol Biol 302 1 265 279 Henikoff and Henikoff 1992 Henikoff S and Henikoff J G 1992 Amino acid substitution matrices from protein blocks Proc Natl Acad Sci U S A 89 22 10915 10919 Hopp and Woods 1983 Hopp T P and Woods K R 1983 A computer program for predicting protein antigenic determinants Mol Immunol 20 4 483 489 Ikai 1980 Ikai A 1980 Thermostability and aliphatic index of globular proteins J Biochem Tokyo 88 6 1895 1898 Janin 1979 Janin J 1979 Surface and inside volumes in globular proteins Nature 277 5696 491 492 Jukes and Cantor 1969 Jukes T and Ca
376. yed in figure 14 6 Notice that you can choose the height of the graphs underneath the sequence 14 2 3 Bioinformatics explained Protein hydrophobicity Calculation of hydrophobicity is important to the identification of various protein features This can be membrane spanning regions antigenic sites exposed loops or buried residues Usually these calculations are shown as a plot along the protein sequence making it easy to identify the location of potential protein features 20 40 l Q6H1U7 mvh BBBSRA aitsiwgkva ie BGgealG HiiivyBWHS Pirangi nakavms ao a eee Figure 14 7 Plot of hydrophobicity along the amino acid sequence Hydrophobic regions on the sequence have higher numbers according to the graph below the sequence furthermore hydrophobic regions are colored on the sequence Red indicates regions with high hydrophobicity and blue indicates regions with low hydrophobicity The hydrophobicity is calculated by sliding a fixed size window of an odd number over the protein sequence At the central position of the window the average hydrophobicity of the entire window is plotted see figure 14 7 Hydrophobicity scales Several hydrophobicity scales have been published for various uses Many of the commonly used hydrophobicity scales are described below Kyte Doolittle scale The Kyte Doolittle scale is widely used for detecting hydrophobic regions in proteins Regions with a positive value are hydrophobic This scale can be u
377. yses E E y Antigenicity analysis a E Protein charge analysis a a Reverse translation from protein to DNA u a a Proteolytic cleavage detection C Prediction of signal peptides SignalP a a Transmembrane helix prediction TMHMM C E Secondary protein structure prediction E nm PFAM domain search a a Sequence alignment Free Protein Gene Combined Multiple sequence alignments Two algo E a a a rithms Advanced re alignment and fix point align a a a ment options Advanced alignment editing options a a a Consensus sequence determination and E E E a management Conservation score along sequences a a a y Sequence logo graphs along alignments E E a Gap fraction graphs i a a Dot plots Free Protein Gene Combined Dot plot based analyses nm E m Phylogenetic trees Free Protein Gene Combined Neighbor joining and UPGMA phylogenies a a a a Pattern discovery Free Protein Gene Combined Search for sequence match u ui C Motif search a a u Pattern discovery C E a APPENDIX A COMPARISON OF WORKBENCHES 267 Primer design Free Protein Gene Combined Advanced primer design tools Detailed primer and probe parameters Graphical display of primers Generation of primer design output Support for Standard PCR Support for Nested PCR Support for TaqMan PCR Support for Sequencing primers Match primer with sequence Ordering of primers Assembly of sequencing data Free Protein Gene Combined Advanced contig assembly

Untitled - Lightwave Scientific

Contents

Download Pdf Manuals

Related Search

Related Contents