Home
        here - CLC bio
         Contents
1.                                                                                                                                           Figure 17 22  An example of the packed compactness setting     17 7 2 Editing the contig    When editing contigs  you are typically interested in confirming or changing single bases  and  this can be done simply by     selecting the base   typing the right base    Some users prefer to use lower case letters in order to be able to see which bases were altered  when they use the results later on  In CLC DNA Workbench all changes are recorded in the history    log  see section 8  allowing the user to quickly reconstruct the actions performed in the editing  session     There are three shortcut keys for easily finding the positions where there are conflicts   e Space bar  Finds the next conflict     e      punctuation mark key   Finds the next conflict     e      comma key   Finds the previous conflict     In the contig view  you can use Zoom in  D  to zoom to a greater level of detail than in other  views  see figure 17 20   This is useful for discerning the trace curves     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 300    If you want to replace a residue with a gap  use the Delete key   If you wish to edit a selection of more than one residue   right click the selection   Edit Selection  2     This will show a warning dialog  but you can choose never to see this dialog again by clicking the  checkbox at the bottom of the dialog     No
2.                     annotation Tab ES  X   Rows  28 E New Annotation Filter  kh   i    Name Type Region Qualifiers   Shown annotation types  CDS  TORES f    forganismrHomo sapiens ene  fmol type mRNA     db_xref  taxon 9606  Rikers  V   chromosome X seiis    map Xql1 2 q12  v  515      Select all   gene AR Deselect all    1023  1097 standard name GDB  600694   db_xref  UniSTS 99252      gene AR  836  958  standard_name DxXS7498  fdb_xref  UniSTS 38944                    a OBE me  amp   Figure 10 9  A table showing annotations on the sequence     only wish to see  gene  annotations  de select the other annotation types so that only  gene  is  selected     Each row in the table is an annotation which is represented with the following information     e Name     Type     e Region     Qualifiers   The Name  Type and Region for each annotation can be edited simply by double clicking  typing  the change directly  and pressing Enter     This information corresponds to the information in the dialog when you edit and add annotations   see section 10 3 2      You can benefit from this table in several ways     e It provides an intelligible overview of all the annotations on the sequence     e You can use the filter at the top to search the annotations  Type e g   UCP  into the filter  and you will find all annotations which have  UCP  in either the name  the type  the region  or the qualifiers  Combined with showing or hiding the annotation types in the Side Panel   this makes it easy t
3.                 Region   977  Modified element   Rews  Comments  Edit  No Comment  Assembled sequences to reference  Wed Jan 21 10 38 50 CET 2009                          Figure 8 1  An element   s history     to your locale settings  see section 5 1      e User  The user who performed the operation  If you import some data created by another  person in a CLC Workbench  that persons name will be shown     e Parameters  Details about the action performed  This could be the parameters that was  chosen for an analysis     e Origins from  This information is usually shown at the bottom of an element   s history  Here   you can see which elements the current element origins from  If you have e g  created an  alignment of three sequences  the three sequences are shown here  Clicking the element  selects it in the Navigation Area  and clicking the    history    link opens the element   s own  history     e Comments  By clicking Edit you can enter your own comments regarding this entry in the  history  These comments are saved     8 1 1 Sharing data with history    The history of an element is attached to that element  which means that exporting an element  in CLC format    clc  will export the history too  In this way  you can share folders and files  with others while preserving the history  If an element s history includes source elements  i e   if there are elements listed in  Origins from      they must also be exported in order to see the  full history  Otherwise  the history w
4.           Finish                 Figure 9 1  Inputting five sequences to Find Binding Sites and Create Fragments     e All subfolders are treated as individual batch units     This means that if the subfolder    contains several input files  they will be pooled as one batch unit  Nested subfolders  i e     subfolders within the subfolder  are ignored     An example of a batch run is shown in figure 9 2     e All files that are not in subfolders are treated as individual batch units        E  q Find Binding Sites and Create Fragments    1  Choose where to run       Navigation Area  2  Select nucleotide        ha CLC Data a     Example Data     ae  4 9 Cloning vector libre  fj Enzyme lists     206 pcDNA3 atpsal  H206 pcDNAt  TO   4  Processed data    Primers  protein  Protein analyses  Protein orthologs  RNA secondary structe  A Sequencing data  c ATPBal genomic seque   206 ATPBal mRNA    sequence s  to match  primer against                               Of EDET             nm E                Selected Elements  1         9 E coli Illumina    4 ot r  Q    lt enter search term gt  A   Z  Batch       Previous      gt  Next                          o          1 Cloning                       Finish                 Figure 9 2  The Cloning folder includes both folders and sequences     The Cloning folder that is found in the example data  see section 1 6 2  contains two sequences   x  and three folders  HJ   If you click Batch  only folders can be added to the list of selected  eleme
5.        Q    enter search term gt        Figure 18 1  Selecting one or more sequences containing the fragments you want to clone     Note that the vector sequence will be selected when you click Next as shown in figure figure 18 2     Select the cloning vector by clicking the browse  GT  button  Once the sequence has been  selected  click Finish  The CLC DNA Workbench will now create a sequence list of the fragments  and vector sequences and open it in the cloning editor as shown in figure 18 3     When you save the cloning experiment  it is saved as a Sequence list  See section 10 7 for  more information about sequence lists  If you need to open the list later for cloning work  simply  switch to the Cloning  ij  editor at the bottom of the view     If you later in the process need additional sequences  you can easily add more sequences to the    CHAPTER 18  CLONINGANDCUTTING  8  1  Select fragments to done   Setpa      2  Select vector sequence    Vector    oc pcDNA4 TO          a        Previous      Figure 18 2  Selecting a cloning vector        view  Just     right click anywhere on the empty white area   Add Sequences    18 1 1 Introduction to the cloning editor    In the cloning editor  most of the basic options for viewing  selecting and zooming the sequences  are the same as for the standard sequence view  See section 10 1 for an explanation of these  options  This means that e g  known SNP s  exons and other annotations can be displayed on  the sequences to guide the 
6.       Eia      Select al      Deselect             Figure 18 5  Hindlll and Xhol sites used to open the vector        E Adapt overhangs       1  Adapt overhangs   m           GPL UVS aNg    Replace input sequences with result       pcDNA4 TO Fragment  ATP8a1 mRNA  ATPS    pcDNA4 TO  o  b gt    b gt  b  STTA ASCITAT GE TCGAGTC 3   3 446bp   3 AATTCGA ATA        CTGAGCT CAG  q q q q  Y Y             Figure 18 6  Showing the insertion point of the vector     This dialog visualizes the details of the insertion  The vector sequence is on each side shown in  a faded gray color  In the middle the fragment is displayed  If the overhangs of the sequence and  the vector do not match  you can blunt end or fill in the overhangs using the drag handles   l       Click and drag with the mouse to adjust the overhangs     Whenever you drag the handles  the status of the insertion point is indicated below     e The overhangs match  f      e The overhangs do not match  2   In this case  you will not be able to click Finish  Drag    the handles to make the overhangs match     The fragment can be reverse complemented by clicking the Reverse complement fragment  Kg      When several fragments are used  the order of the fragments can be changed by clicking the    move buttons   h    da      There is an options for the result of the cloning  Replace input sequences with result  Per  default  the construct will be opened in a new view and can be saved separately  By selecting  this option  the constr
7.       License Wizard EJ    d CLC DNA Workbench       License Agreement      Please read and accept the license agreement below to begin using you license    END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE        CLC Genomics Workbench 1 0       1 Recitals    1 1 This End User License Agreement   EULA   is a legal agreement between you  either an individual person  or a single legal entity  who will be referred to in this EULA as  You   and CLC bio A S  CVR no   28 30 50 87  for the software products that accompanies this EULA  including any associated media  printed matenals and  electronic documentation  the  Software Product         I accept these terms    If you experience any problems  please contact The CLC Support Team    Figure 1 13  Read the license agreement carefully              If the Workbench succeeds to find an existing license  the next dialog will look as shown in  figure 1 14        License Wizard    amp 3      d CLC DNA Workbench       Upgrade a License     The workbench will attempt to find a valid license for a previous version     If a license can not be located  or if you would like to upgade a different license  please click the  Choose a different  License File    button and locate it manually     C  Program Files CLC Combined Workbench 3 licenses workbench clccombinedwb key  License Number  CLCCOMBINEDWB3     Choose a different License File       If you experience any problems  please contact The CLC Support Team    Proxy Settings Previous   Quit Wor
8.      1   G  2   A 1   G2  R  Conflict resolution  vote    Conflict   amp   1   G  2   G ter gil E Conflict resolution  vote    CETE  BEG T  1        Figure 17 24  The graphical view is displayed at the top  At the bottom the conflicts are shown in a  table  At the conflict at position 637  the user has entered a comment in the table  This comment  is now also reflected on the tooltip of the conflict annotation in the graphical view above     The table has the following columns     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 304    e Reference position  The position of the conflict measured from the starting point of the  reference sequence     e Consensus position  The position of the conflict measured from the starting point of the  consensus sequence     e Consensus residue  The consensus   s residue at this position  The residue can be edited  in the graphical view  as described above     e Other residues  Lists the residues of the reads  Inside the brackets  you can see the  number of reads having this residue at this position  In the example in figure 17 24  you  can see that at position 637 there is a    C    in the top read in the graphical view  The other  two reads have a  T   Therefore  the table displays the following text     C  1   T  2         e IUPAC  The ambiguity code for this position  The ambiguity code reflects the residues in  the reads   not in the consensus sequence   The IUPAC codes can be found in section l      e Status  The status can either b
9.      600 agg 1 000    1 200             1 400       Reverse primer region  Atpsa     T7 Promoter  Atp8a1    Forward primer region       Figure 2 39  A forward and a reverse primer region     Now  you can let CLC DNA Workbench calculate all the possible primer pairs based on the Primer  parameters that you have defined     Click the Calculate button  right hand pane    Modify parameters regarding the  combination of the primers  for now  just leave them unchanged   Calculate    This will open a table showing the possible combinations of primers  To the right  you can specify  the information you want to display  e g  showing Fragment length  see figure 2 40      Em pcDNA3 atp8al                   Filter  All v E GE        gt       Rows  100 Standard primers For  pcDNA3 atp8al primers               Show column    Score     Pair annealing align  Fwd Rev  Fragment length    Sequence Fwd Melt  temp  Fwd Sequence Rev Melt  temp  Rev  T   Z  Score    GGTGGGAGGTCTATATAA      Pair annealing  Fwd Rev     62 56   II Ut 49 094    598 00 GGTGGGAGGTCTATATAA 48 572 GGAACTGAGAATAGAGGAA  7  Pair annealing align  Fwd Rev     AAGGAGATAAGAGTCAAGG    GGTGGGAGGTCTATATAA  57 873   II Il  AGGAGATAAGAGTCAAGG    598 00 GGTGGGAGGTCTATATAA    48 572 GGAACTGAGAATAGAGGA    49 566       Pair end annealing  Fwd  Rev    V  Fragment length  Fwd  Rev      V  Sequence Fwd      Region Fwd  GCCGTGGATAGCGCGTTTGA     I l      AGAACTACGTTGGTCGGAG    Ea Ob y  Figure 2 40  A list of primers  To the right are the Side 
10.      A  Computer  D       Network Files of type  Portable Document Format   pdf  v  Directory  C  Users smoensted Desktop  Name  ATP8al   pdf            Previous  gt  Next   XX Cancel            Figure 7 11  Location and name for the graphics file     Format Suffix Type   Portable Network Graphics  png bitmap   JPEG Jpg bitmap   Tagged Image File tif bitmap  PostScript ps vector graphics  Encapsulated PostScript  eps vector graphics  Portable Document Format  pdf vector graphics  Scalable Vector Graphics  SVg vector graphics    These formats can be divided into bitmap and vector graphics  The difference between these two  categories is described below     Bitmap images    In a bitmap image  each dot in the image has a specified color  This implies  that if you zoom in  on the image there will not be enough dots  and if you zoom out there will be too many  In these  cases the image viewer has to interpolate the colors to fit what is actually looked at  A bitmap  image needs to have a high resolution if you want to zoom in  This format is a good choice for  storing images without large shapes  e g  dot plots   It is also appropriate if you don t have the  need for resizing and editing the image after export     CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 127    Vector graphics    Vector graphic is a collection of shapes  Thus what is stored is e g  information about where a  line starts and ends  and the color of the line and its width  This enables a given viewer to decide  
11.      For every base  the Workbench calculates the running sum of this value  If the sum drops  below zero  it is set to zero  The part of the sequence to be retained after trimming is  the region between the first positive value of the running sum and the highest value of the  running sum  Everything before and after this region will be trimmed off     A read will be completely removed if the score never makes it above zero     At http   www clcbio com files usermanuals trim zip you find an example  sequence and an Excel sheet showing the calculations done for this particular sequence to  illustrate the procedure described above     e Trim ambiguous nucleotides  This option trims the sequence ends based on the presence  of ambiguous nucleotides  typically N   Note that the automated sequencer generating the  data must be set to output ambiguous nucleotides in order for this option to apply  The  algorithm takes as input the maximal number of ambiguous nucleotides allowed in the  sequence after trimming  If this maximum is set to e g  3  the algorithm finds the maximum  length region containing 3 or fewer ambiguities and then trims away the ends not included  in this region     e Trim contamination from vectors in UniVec database  If selected  the program will match  the sequence reads against all vectors in the UniVec database and remove sequence  ends with significant matches  the database is included when you install the CLC DNA  Workbench   A list of all the vectors in the
12.      in A not B  is found in A but not in B  the spliced  alignment will contain a sequence named  in A not B   The first part of this sequence will contain  the characters from A  but since no sequence information is available from B  a number of gap  characters will be added to the end of the sequence corresponding to the number of residues  in B  Note  that the function does not require that the individual alignments contain an equal  number of sequences     19 5 Pairwise comparison    For a given set of aligned sequences  see chapter 19  it is possible make a pairwise comparison  in which each pair of Sequences are compared to each other  This provides an overview of the  diversity among the sequences in the alignment     In CLC DNA Workbench this is done by creating a comparison table   Toolbox in the Menu Bar   Alignments and Trees  E    Pairwise Comparison   4   or right click alignment in Navigation Area   Toolbox   Alignments and Trees        Pairwise Comparison  HE   This opens the dialog displayed in figure 19 13     If an alignment was selected before choosing the Toolbox action  this alignment is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove elements from  the Navigation Area  Click Next to adjust parameters     19 5 1 Pairwise comparison on alignment selection  A pairwise comparison can also be performed for a selected part of an alignment   right click on an alignment selection   Pairwise Comparison  HE     This lead
13.     2001      e Inner melting temperature  This option is only activated when the Nested PCR or TaqMan  mode is selected  In Nested PCR mode  it determines the allowed melting temperature  interval for the inner nested pair of primers  and in TaqMan mode it determines the allowed  temperature interval for the TaqMan probe     e Advanced parameters  A number of less commonly used options        Buffer properties  A number of parameters concerning the reaction mixture which  influence melting temperatures       Primer concentration  Specifies the concentration of primers and probes in units   of nanomoles  nM    Salt concentration  Specifies the concentration of monovalent cations   N AF      K   and equivalents  in units of millimoles  mM    Magnesium concentration  Specifies the concentration of magnesium cations     Mgt    in units of millimoles  mM    dNTP concentration  Specifies the concentration of deoxynucleotide triphos    phates in units of millimoles  mM    x DMSO concentration  Specifies the concentration of dimethyl sulfoxide in units  of volume percent  vol       x         N        GC content  Determines the interval of CG content    C and G nucleotides in the  primer  within which primers must lie by setting a maximum and a minimum GC  content         Self annealing  Determines the maximum self annealing value of all primers and  probes  This determines the amount of base pairing allowed between two copies of    CHAPTER 16  PRIMERS 253    the same molecule  The s
14.     Accession Definition Modification Date  A  AM270166 Aspergillus niger contig 4n08c0110  complete genome 2007 03 24  4M711867 Clavibacter michiganensis subsp  michiganensis NCPPB      2007 05 18  AP008209 Oryza sativa  japonica cultivar group  genomic DNA  c    2007 05 19 J  BA000016 Clostridium perfringens str  13 DNA  complete genome  2007 05 19  BC029387 Homo sapiens hemoglobin  gamma G  mRNA  cDNA clon    2007 02 08  BC130457 Homo sapiens hemoglobin  gamma G  mRNA  cDNA clon     2007 01 04  BC130459 Homo sapiens hemoglobin  gamma G  mRNA  cDNA clon    2007 01 04    BC139602 Danio rerio hemoglobin beta embryonic 2  mRNA  cDNA    2007 04 18  BC142787 Danio rerio hemoglobin beta embryonic 1  mRNA  cDNA     2007 06 11  Bx842577 Mycobacterium tuberculosis H37Ry   complete genome       2006 11 14 v   H Download and Open 4 Download and Save Total number of hits  245  Open at NCBI    Figure 11 1  The GenBank search view     As default  CLC DNA Workbench offers one text field where the search parameters can be entered   Click Add search parameters to add more parameters to your search    Note  The search is a    and    search  meaning that when adding search parameters to your  search  you search for both  or all  text strings rather than  any  of the text strings     You can append a wildcard character by checking the checkbox at the bottom  This means that  you only have to enter the first part of the search text  e g  searching for  genom  will find both   genomic  and  geno
15.     BglII 5    gatc N4 methy    tt E   Smal Blunt   N4 methy           EcoRI 5   aatt N6 methy          Sall 5  tcga N6 methy           EcoRV Blunt   N6 methy           gt    PstI 3   taca N6 methy                   HindIII 5    agct N6 methy             XhoI 5    tcga N6 methy           PstI 3   taca N6 methy          S  EcoRV Blunt   N6 methy           Sall 5  tcga N6 methy          BglII 5    gatc N4 methy            Smal Blunt   N4 methy           Xbal 5   ctag N6 methy           Xbal 5   ctag N6 methy          HindIII 5   agct N6 methy           XhoI 5  tcga N6 methy    70k BamHI 5    gatc N4 methy           Clal S ra N6 methy  4 E     Save  Save as new enzyme list  ZDES evs    pee ECTE       Figure 18 32  Adding or removing enzymes from the Side Panel     At the top  you can choose to Use existing enzyme list  Clicking this option lets you select an  enzyme list which is stored in the Navigation Area  See section 18 5 for more about creating  and modifying enzyme lists     Below there are two panels     e To the left  you see all the enzymes that are in the list select above  If you have not chosen  to use an existing enzyme list  this panel shows all the enzymes available        e To the right  there is a list of the enzymes that will be used     Select enzymes in the left side panel and add them to the right panel by double clicking or clicking  the Add button  E gt    If you e g  wish to use EcoRV and BamHI  select these two enzymes and  add them to the right side pa
16.     By clicking the Dock icon  48  the floating Side Panel reappear in the right side of the view  The  size of the floating Side Panel can be adjusted by dragging the hatched area in the bottom right     Chapter 6    Printing    Contents  6 1 Selecting which part of the view to print         08 082 ee ee eee 114  Ge Foree eek geet weae ea beece ee wena Se eee ee E E 115  6 2 1 Header and footer 2a 4 ecu daw tba R eRe ee ED EPA RM GE we 116  6 3 Printpreview     2    ee ee ee 116    CLC DNA Workbench offers different choices of printing the result of your work     This chapter deals with printing directly from CLC DNA Workbench  Another option for using the  graphical output of your work  is to export graphics  see chapter   3  in a graphic format  and  then import it into a document or a presentation     All the kinds of data that you can view in the View Area can be printed  The CLC DNA Workbench  uses a WYSIWYG principle  What You See Is What You Get  This means that you should use the  options in the Side Panel to change how your data  e g  a sequence  looks on the screen  When  you print it  it will look exactly the same way on print as on the screen     For some of the views  the layout will be slightly changed in order to be printer friendly     It is not possible to print elements directly from the Navigation Area  They must first be opened  in a view in order to be printed  To print the contents of a view     select relevant view   Print  5  in the toolbar  This will
17.     Chapter 2    Tutorials    Contents  2 1 Tutorial  Getting started     1    2 ee a 37  2 1 1 Creating a a folder   seas oa ee eRe RE wR ee ew dom E 37  2 1 2 Import dat     hc Se eed GHEE SHRED STD 38  2 2 Tutorial  View sequence        0 08 eee eee ee a 39  2 3 Tutorial  Side Panel Settings          0 0 28 ee eee ee es 41  2 3 1 Saving the settings inthe Side Panel                      41  2 3 2 Applying saved settings          dees deere ARA ees eee as 43  2 4 Tutorial  GenBank search and download        sssaaa nsan n ssassn 43  2 4 1 Searching for matching objects               2 200200 08  44  24 2 Saving the sequence        6 2 2 eee a a e 44  2 5 Tutorial  Assembly  gcc ec ack wate eee Rated ew Sans bee ES 45  2 5 1 Trimming the SEQUENCES    6 sc eee ewe eee eee He eee wR ES 45  2 5 2 Assembling the sequencing data              0 a 08 46  2 5 3 Getting an overview of the contig              5 8052 ee aee 47  2 5 4 Finding and editing conflicts              0 0 08 2 2 ee eaee 41  2 5 5 Including regions that have been trimmed off                 48  2 5 6    nspecting the traces         0    eee ee ee te ee ee 48  2 5 f Synonymous substitutions     a  sooo a a e ee 49  2 5 8 Getting an overview of the conflicts                  5 058  50  2 5 9 Documenting your changes            0 0 00 ee ee eee eee 50  2 5 10 Using the result for further analyses         2    200 eee ees 50  2 6 Tutorial  In silico cloning cloning work flow         0 58208 08 eee wee 52  2 6 1 Locat
18.     Desired temperature difference in melting temperature between outer  primers  and inner   TaqMan  oligos   the scoring function discounts solution sets which deviate greatly from  this value  Regarding this  and the minimum difference option mentioned above  please  note that to ensure flexibility there is no directionality indicated when setting parameters  for melting temperature differences between probes and primers  i e  it is not specified  whether the probes should have a lower or higher Im  Instead this is determined by  the allowed temperature intervals for inner and outer oligos that are set in the primer  parameters preference group in the side panel  If a higher Tm of probes is required  choose  a Tm interval for probes which has higher values than the interval for outer primers     The output of the design process is a table of solution sets  Each solution set contains the  following  a set of primers which are general to all Sequences in the alignment  a TaqMan  probe which is specific to the set of included sequences  Sequences where selection boxes are  checked  and a TaqMan probe which is specific to the set of excluded sequences  marked by      Otherwise  the table is similar to that described above for TaqMan probe prediction on single  sequences     16 10 Analyze primer properties    CLC DNA Workbench can calculate and display the properties of predefined primers and probes     select a primer sequence  primers are represented as DNA sequences in the  
19.     Prags and Probes    tag Cloning and Restriction Sites Numbers on plus strand     BLAST Search Follow selection   lg Database Search l v  Processes   Toolbox kar O E   Ch     i  Idle    1 elementis  are selected  Status Bar    Figure 3 1  The user interface consists of the Menu Bar  Toolbar  Status Bar  Navigation Area   Toolbox  and View Area     3 1 Navigation Area    The Navigation Area is located in the left side of the screen  under the Toolbar  see figure 3 2    It is used for organizing and navigating data  Its behavior is similar to the way files and folders  are usually displayed on your computer       CLC_Data  EE Example Data  a Cloning vectors    FE Extra    aa Nucleotide     GF Protein    GEG RNA    E  a e       Or  centr searchter gt  JA  Figure 3 2  The Navigation Area     CHAPTER 3  USER INTERFACE 18    3 1 1 Data structure    The data in the Navigation Area is organized into a number of Locations  When the CLC DNA  Workbench is started for the first time  there is one location called CLC_Data  unless your  computer administrator has configured the installation otherwise      A location represents a folder on the computer  The data shown under a location in the Navigation  Area is stored on the computer in the folder which the location points to     This is explained visually in figure 3 3        CLC Data  File Edit View Favorites Tools Help ay         Back   S pP Search       NI VIGOLIVIT A  O 2 2 2 2       SY     Sa tas Y  z e CLC Data       Example data 
20.     Sacl   agct 5   S meth          SphI   catg poai  Apal   qgec 5  5S meth          gt     Ball  nnn S   N4 met         Chal   gate eto       Fokl   lt N amp  gt  3   N6 met        E  Hhal   cg 5   S meth         NsiI   tgca      Sacll  gc 5   S meth                             La a 0  CM 0  0  0  0  Co a w          Figure 18 33  Selecting enzymes        If you need more detailed information and filtering of the enzymes  either place your mouse  cursor on an enzyme for one second to display additional information  see figure 18 52   or use  the view of enzyme lists  See 18 5      All enzymes                      Filter  3   Name Overh    Methyl    Pop       PstI 3 N   meth    er ES  KpnI 3  N6 meth     t    SacI 3 S methyl    ja    Sphi 3  HEEE   Apal 3 5 methyl    pr   Sacll 3  5 methyl                        NsiI       Enzyme  SacII  Recognition site pattern  CCGCGG  Suppliers  GE Healthcare  Qbiogene  American Allied Biochemical  Inc   Nippon Gene Co   Ltd     Takara Bio Inc    New England Biolabs  Toyobo Biochemicals  Molecular Biology Resources  Promega Corporation   EURx Ltd        Figure 18 34  Showing additional information about an enzyme like recognition sequence or a list  of commercial vendors     At the bottom of the dialog  you can select to save this list of enzymes as a new file  In this way   you can save the selection of enzymes for later use     When you click Finish  the enzymes are added to the Side Panel and the cut sites are shown on  the sequence    
21.     The hydrophobicity is calculated by sliding a fixed size window  of an odd number  over the protein  sequence  At the central position of the window  the average hydrophobicity of the entire window  is plotted  see figure 15 7      Hydrophobicity scales    Several hydrophobicity scales have been published for various uses  Many of the commonly used  hydrophobicity scales are described below     Kyte Doolittle scale  The Kyte Doolittle scale is widely used for detecting hydrophobic regions in  proteins  Regions with a positive value are hydrophobic  This scale can be used for identifying  both surface exposed regions as well as transmembrane regions  depending on the window size  used  Short window sizes of 5   generally work well for predicting putative surface exposed  regions  Large window sizes of 19 21 are well suited for finding transmembrane domains if the  values calculated are above 1 6  Kyte and Doolittle  1982   These values should be used as a  rule of thumb and deviations from the rule may occur     CHAPTER 15  PROTEIN ANALYSES 242    Engelman scale  The Engelman hydrophobicity scale  also known as the GES scale  is another  scale which can be used for prediction of protein hydrophobicity  Engelman et al   1986   As the  Kyte Doolittle scale  this scale is useful for predicting transmembrane regions in proteins     Eisenberg scale  The Eisenberg scale is a normalized consensus hydrophobicity scale which  shares many features with the other hydrophobicity scale
22.     Whether sequences can be displayed with this information depends on their origin  Sequences  that you have created yourself or imported might not include this information  and you will only be  able to see them represented by their name  However  sequences downloaded from databases  like GenBank will include this information  To change how sequences are displayed     right click any element or folder in the Navigation Area   Sequence Representation    select format    This will only affect sequence elements  and the display of other types of elements  e g   alignments  trees and external files  will be not be changed  If a sequence does not have this  information  there will be no text next to the sequence icon     CHAPTER 3  USER INTERFACE 83    Rename element  Renaming a folder or an element in the Navigation Area can be done in three different ways   select the element   Edit in the Menu Bar   Rename  or select the element   F2  click the element once   wait one second   click the element again    When you can rename the element  you can see that the text is selected and you can move the  cursor back and forth in the text  When the editing of the name has finished  press Enter or  select another element in the Navigation Area  If you want to discard the changes instead  press  the Esc key     For renaming annotations instead of folders or elements  see section 10 3 3     3 1 7 Delete elements  Deleting a folder or an element can be done in two ways   right click the el
23.     e The Primer solution submenu is used to specify requirements for the match of a PCR primer  against the template sequences  These options are described further below  It contains  the following options         Perfect match       Allow degeneracy       Allow mismatches     The work flow when designing alignment based primers and probes is as follows     e Use selection boxes to specify groups of included and excluded sequences  To select all  the sequences in the alignment  right click one of the selection boxes and choose Mark  All     e Mark either a single forward primer region  a single reverse primer region or both on the  sequence  and perhaps also a TaqMan region   Selections must cover all Sequences in  the included group  You can also specify that there should be no primers in a region  No  Primers Here  or that a whole region should be amplified  Region to Amplify      e Adjust parameters regarding single primers in the preference panel     e Click the Calculate button     16 9 2 Alignment based design of PCR primers    In this mode  a single or a pair of PCR primers are designed  CLC DNA Workbench allows the  user to design primers which will specifically amplify a group of included sequences but not  amplify the remainder of the sequences  the excluded sequences  The selection boxes are used  to indicate the status of a sequence  if the box is checked the sequence belongs to the included  sequences  if not  it belongs to the excluded sequences  To design prim
24.    13 6 1 Pattern discovery search parameters    Various parameters can be set prior to the pattern discovery  The parameters are listed below  and a screen shot of the parameter settings can be seen in figure 13 20     e Create and search with new model  This will create a new HMM model based on the  selected sequences  The found model will be opened after the run and presented in a table  view  It can be saved and used later if desired     e Use existing model  It is possible to use already created models to search for the same  pattern in new sequences     e Minimum pattern length  Here  the minimum length of patterns to search for  can be  Specified     CHAPTER 13  GENERAL SEQUENCE ANALYSES 221    e Maximum pattern length  Here  the maximum length of patterns to search for  can be  Specified     e Noise      Specify noise level of the model  This parameter has influence on the level  of degeneracy of patterns in the sequence s   The noise parameter can be 1 2 5 or 10  percent     e Number of different kinds of patterns to predict  Number of iterations the algorithm goes  through  After the first iteration  we force predicted pattern positions in the first run to be  member of the background  In that way  the algorithm finds new patterns in the second  iteration  Patterns marked    Pattern    have the highest confidence  The maximal iterations  to go through is 3     e Include background distribution  For protein sequences it is possible to include information  on the back
25.    5 2 Default view preferences    There are five groups of default View settings       Toolbar    Side Panel Location    New View      View Format    oO FPF WB NO FF      User Defined View Settings     In general  these are default settings for the user interface     The Toolbar preferences let you choose the size of the toolbar icons  and you can choose whether  to display names below the icons     The Side Panel Location setting lets you choose between Dock in views and Float in window   When docked in view  view preferences will be located in the right side of the view of e g  an  alignment  When floating in window  the side panel can be placed everywhere in your screen   also outside the workspace  e g  on a different screen  See section 5 6 for more about floating  Side panels     The New view setting allows you to choose whether the View preferences are to be shown  automatically when opening a new view  If this option is not chosen  you can press  Ctrl   U  36    U on Mac   to see the preferences panels of an open view     The View Format allows you to change the way the elements appear in the Navigation Area  The  following text can be used to describe the element     e Name  this is the default information to be shown      e Accession  Sequences downloaded from databases like GenBank have an accession  number      e Latin name   e Latin name  accession    e Common name     e Common name  accession      The User Defined View Settings gives you an overview of the diff
26.    CLC DNA Workbench    User manual    Manual for  CLC DNA Workbench 6 6  Windows  Mac OS X and Linux    February 23  2012    This software is for research purposes only     CLC bio  Finlandsgade 10 12    DK 8200 Aarhus N  gt   Denmark o  il bio    Contents    1    2    Introduction    Introduction to CLC DNA Workbench    i   1 2   La   1 4 Licenses  La   1 6   1 7 Plugins  1 8   1 9   Tutorials   2 1 Tutorial   2 2 Tutorial   2 3 Tutorial   2 4 Tutorial   2 5 Tutorial   2 6 Tutorial   2   Tutorial   2 8 Tutorial   2 9 Tutorial   2 10 Tutorial   2 11 Tutorial   2 12 Tutorial     Contact information  Download and installation    System requirements    About CLC Workbenches  When the program is installed  Getting started    Network configuration    The format of the user manual    Getting started    View sequence    Side Panel Settings    GenBank search and download    Assembly    In silico cloning cloning work flow    Primer design  BLAST search  aoao a a a a    Tips for specialized BLAST searches    Align protein sequences    Create and modify a phylogenetic tree    Find restriction sites    10  12  12  15  15  2   29  30  33  34    CONTENTS    Core Functionalities    User interface    opal  3 2  3 3  3 4  3 5  3 6    Navigation Area              View Area           2 04     Zoom and selection in View Area    Toolbox and Status Bar        Workspace                 List of shortcuts              Searching your data   4 1 What kind of information can be searched              00
27.    HBB_ANAPP                      HBB AQUCH im  HBB_CALJA              Figure 19 7  Realigning using fixpoints  In the top view  fixpoints have been added to two of the  sequences  In the view below  the alignment has been realigned using the fixpoints  The three top  sequences are very similar  and therefore they follow the one sequence  number two from the top   that has a fixpoint     aligned to each other     Advanced use of fixpoints    Fixpoints with the same names will be aligned to each other  which gives the opportunity for great  control over the alignment process  It is only necessary to change any fixpoint names in very  Special cases     One example would be three sequences A  B and C where sequences A and B has one copy of a  domain while sequence C has two copies of the domain  You can now force sequence A to align  to the first copy and sequence B to align to the second copy of the domains in sequence C  This  is done by inserting fixpoints in sequence C for each domain  and naming them    fp    and    fp2       CHAPTER 19  SEQUENCE ALIGNMENT 353     for example   Now  you can insert a fixpoint in each of sequences A and B  naming them  fp1   and    fp2     respectively  Now  when aligning the three sequences using fixpoints  sequence A will  align to the first copy of the domain in sequence C  while sequence B would align to the second  copy of the domain in sequence C     You can name fixpoints by     right click the Fixpoint annotation   Edit Annotation  S 
28.    PERH EJ  Digest and Create Restriction Map   amp  TECCCATGGTT TCC  Select Sequence T  PERF     Make Sequence Circular    GAAATGGGAAGAGA       120  F web Info ds Google sequence TGC  160 2 NCBI          in PubMed References       PERH3BC ACTCTCCACTCACA  CTC    Figure 11 3  Open webpages with information about this sequence     This will open your computer s default browser searching for the sequence that you selected     11 2 1 Google sequence    The Google search function uses the accession number of the sequence which is used as  search term on http    www google com  The resulting web page is equivalent to typing the  accession number of the sequence into the search field on http    www google com     11 2 2 NCBI    The NCBI search function searches in GenBank at NCBI  http    www ncbi nlm nih gov   using an identification number  when you view the sequence as text it is the  GI  number    Therefore  the sequence file must contain this number in order to look it up at NCBI  All  sequences downloaded from NCBI have this number     11 2 3 PubMed References    The PubMed references search option lets you look up Pubmed articles based on references  contained in the sequence file  when you view the sequence as text it contains a number of   PUBMED  lines   Not all sequence have these PubMed references  but in this case you will se  a dialog and the browser will not open     11 2 4 UniProt    The UniProt search function searches in the UniProt database  http   www ebi uniprot
29.    Pressing Ctrl     on Mac    while you click will refine the existing sorting      383    APPENDIX C  WORKING WITH TABLES 384    C 1 Filtering tables    The final concept to introduce is Filtering  The table filter as an advanced and a simple mode   The simple mode is the default and is applied simply by typing text or numbers  see an example  in figure C 2      HA Find reading          Rows  91 169 Find reading Frame output Filter               Length Found ab strand Start codon  14 306 57a negative PIIN AM  405 B00 396 negative TT    1378 1 52 375 negative TAT E    1995    2403    2309 alz negative AAT    Tn dae oes ee   o    Figure C 2  Typing  neg  in the filter in simple mode     Typing  neg  in the filter will only show the rows where  neg  is part of the text in any of the  columns  also the ones that are not shown   The text does not have to be in the beginning   thus  ega  would give the same result  This simple filter works fine for fast  textual and  non complicated filtering and searching     However  if you wish to make use of numerical information or make more complex filters  you can  switch to the advanced mode by clicking the Advanced filter  j   button  The advanced filter is  structure in a different way  First of all  you can have more than one criterion in the filter  Criteria  can be added or removed by clicking the Add   S  or Remove  E  buttons  At the top  you can  choose whether all the criteria should be fulfilled  Match all   or if just one of th
30.    Protein Analyses  ha    Create Protein  Charge Plot  L      This opens the dialog displayed in figure 15 1     237    CHAPTER 15  PROTEIN ANALYSES 238       E  q Create Protein Charge Plot    1  Select a protein    Select a proe  Projects  Selected Elements   2              CLC_Data ss   ATP8al     Example Data ye 094296  Shs ATP8al     Cloning     Primers     Protein analyses     Protein orthologs  ye   gt  P39524   f P57792    Sys 929449    gt   Shs QONTI2            As 95x33         RNA secondary struc     Sequencing data    4 uw p  Qy   zenter search term gt  A    Figure 15 1  Choosing protein sequences to calculate protein charge                       lf a sequence was selected before choosing the Toolbox action  the sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     You can perform the analysis on several protein sequences at a time  This will result in one  output graph showing protein charge graphs for the individual proteins     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     15 1 1 Modifying the layout    Figure 15 2 shows the electrical charges for three proteins  In the Side Panel to the right  you  can modify the layout of the graph   Protein charge    100    Charge     100       200    pH    Figure 15 2  View of the protein charge     See section B in the appendix for information about the graph
31.    To install a plug in  click the Download Plug ins tab  This will display an overview of the plug ins  that are available for download and installation  see figure 1 24      Manage Plug ins and Resources    T i o Ca    Manage Plug ins Download Plug ins Manage Resources Download Resources          oO Bookmark Navigator      version 1 03 g ca    E cen     Additional allignments  With this extension you can bookmark elements in the Navigation    Area  Version 1 02    Description  Perform alignments with many different programs from within the  workbench        ClustalW  Windows Mac Linux     Muscle  Windows Mac Linux     T Coffee  Mac Linux     Download and install   MAFFT  Mac Linux       Kalign  Mac Linux     Extract Annotations  g Version 1 02  Extracts annotations from one or more sequences  The result is a More information is available on the     sequence list containing sequences covered by the specified Additional alignments plugin website     annotations     Additional information       E Usage  g Annee MEN GET ME Located in  Toolbox   gt  Alignments and Trees   gt  Additional Alignments   Version 1 02    Using this plug in it is possible to annotate a sequence From list of  annotations found in a GFF file Y E Additional Alignments  Located in the Toolbox  bag   HEE Clustal Alignment    Q SignalP HEE Muscle Alignment  Version 1 02      Clustal Alignment ht  a iE    Figure 1 24  The plug ins that are available for download              a a             Clicking a plug in
32.    org  using the accession number  Furthermore  it checks whether the sequence was indeed  downloaded from UniProt     CHAPTER 11  ONLINE DATABASE SEARCH 1 2    11 2 5 Additional annotation information    When sequences are downloaded from GenBank they often link to additional information on  taxonomy  conserved domains etc  If such information is available for a sequence it is possible  to access additional accurate online information  If the db xref identifier line is found as part  of the annotation information in the downloaded GenBank file  it is possible to easily look up  additional information on the NCBI web site     To access this feature  simply right click an annotation and see which databases are available     Chapter 12    BLAST Search    Contents  12 1 Running BLAST searches         0 0 08 ee eee ee ee es 174  LA BEAST UNGER 4 caras Cone eee eee ee ES eee we a 175  12 1 2 BLAST a partial sequence against NCBI                206  178  12 1 3 BLAST against local data   css a a a Dee we ED  amp  178  12 1 4 BLAST a partial sequence against a local database              180  12 2 Output from BLAST searches          0 0 2 eee eee ee ee a 180  12 2 1 Graphical overview for each query sequence             2  8080 180  12 2 2 Overview BLAST table          0 0  0 0 ee eee te ee ee 180  12 2 3 BLAST gra  aphitS g  sceatean ee cee eRe EES ERS Gee ewe    E 182  LO LL rrr rc ARARAS ae eee 183  12 3 Local BLAST databases          0  0 0 ee eee eee ee es 185  12 3 1 Make pre
33.    type the name in the   Name    field    19 2 View alignments    Since an alignment is a display of several sequences arranged in rows  the basic options for  viewing alignments are the same as for viewing sequences  Therefore we refer to section 10 1  for an explanation of these basic options     However  there are a number of alignment specific view options in the Alignment info and the  Nucleotide info in the Side Panel to the right of the view  Below is more information on these  view options     Under Translation in the Nucleotide info  there is an extra checkbox  Relative to top sequence   Checking this box will make the reading frames for the translation align with the top sequence so  that you can compare the effect of nucleotide differences on the protein level     The options in the Alignment info relate to each column in the alignment     e Consensus  Shows a consensus sequence at the bottom of the alignment  The consensus  sequence is based on every single position in the alignment and reflects an artificial  sequence which resembles the sequence information of the alignment  but only as one  single sequence  If all Sequences of the alignment is 100  identical the consensus  sequence will be identical to all sequences found in the alignment  If the sequences of the  alignment differ the consensus sequence will reflect the most common sequences in the  alignment  Parameters for adjusting the consensus sequences are described below         Limit  This option deter
34.   11 2 1 Google sequence      aoao oaoa ononon omo oa o a 171  Ki Ml bes eed eee kee ee eae ee hee ew eee ee eG 171  11 2 3 PubMed References           0 0  0 ee ee ee ee ee ee ee 171  hie VO ok ean eae eee ee eee ee ee Bee eee ee eS ee ee 171  11 2 5 Additional annotation information              00 28558 ee ee 172    CLC DNA Workbench offers different ways of searching data on the Internet  You must be online  when initiating and performing the following searches     11 1 GenBank search    This section describes searches for sequences in GenBank   the NCBI Entrez database  The  NCBI search view is opened in this way  figure 11 1      Search   Search for Sequences at NCBI       or Ctrl   B  3   B on Mac     This opens the following view     11 1 1 GenBank search options    Conducting a search in the NCBI Database from CLC DNA Workbench corresponds to conducting  the search on NCBI s website  When conducting the search from CLC DNA Workbench  the results  are available and ready to work with straight away     You can choose whether you want to search for nucleotide sequences or protein sequences     167    CHAPTER 11  ONLINE DATABASE SEARCH 168       NCBI search                                                                               Choose database      Nucleotide  O Protein  al Fields v   human    x   al Fields v   hemoglobin    x   al Fields v   complete   B  Add search parameters  8 Start search     Append wildcard     to search words  Rows  50 Search results Filter 
35.   211  377  Local Database  BLAST  1 8  Locale setting  105  Location   search in  101   of selection on sequence  92   path to  78   Side Panel  106  Locations   multiple  376  Log of batch processing  138  Logo  sequence  354  3 8  LR reaction  Gateway cloning  326     ma4  file format  395  Mac OS X installation  13  Manage BLAST databases  188  Manipulate sequences  377  380  Manual editing  auditing  105  Manual format  34  Marker  in gel view  343  Maximize size of view  89  Maximum likelihood  3 9  Melting temperature  DMSO concentration  252  dNTP concentration  252  Magnesium concentration  252    410    Melting temperature  252  Cation concentration  252  270  Cation concentration  2 2  Inner  252  Primer concentration  252  270  Primer concentration  2 2  Menu Bar  illustration      MFold  379  mmCIF  file format  395  Mode toolbar  91  Modification date  160  Modify enzyme list  345  Modules  30  Molecular weight  215  Motif list  227  Motif search  221  227  379  Mouse modes  91  Move  content of a view  92  elements in Navigation Area  80  sequences in alignment  358   msf  file format  395  Multiple alignments  304  3 8  Multiplexing  2 9  by name  2 9  Multiselecting  80    Name  160  Navigation Area       create local BLAST database  18 7   illustration      NCBI  167   search sequence in  1 1   search  tutorial  43  NCBI BLAST   add more databases  38   Negatively charged residues  217  Neighbor Joining algorithm  373  Neighbor joining  379  Nested PCR prime
36.   24342 soe t lhos negative AT    CADDE Cec ETEC mansabi Tr T    Figure C 3  The advanced filter showing open reading frames larger than 400 that are placed on  the negative strand     Both for the simple and the advanced filter  there is a counter at the upper left corner which tells  you the number of rows that pass the filter  91 in figure C 2 and 15 in figure C 3      Appendix D    BLAST databases    Several databases are available at NCBI  which can be selected to narrow down the possible  BLAST hits     D 1 Peptide sequence databases    D 2    nr  Non redundant GenBank CDS translations   PDB   SwissProt   PIR   PRF  excluding  those in env_nr     refseq  Protein sequences from NCBI Reference Sequence project http   www ncbi   nlm nih gov RefSeg      swissprot  Last major release of the SWISS PROT protein sequence database  no incre   mental updates      pat  Proteins from the Patent division of GenBank     pdb  Sequences derived from the 3 dimensional structure records from the Protein Data  Bank http   www rcsb org pdb      env nr  Non redundant CDS translations from env nt entries     month  All new or revised GenBank CDS translations   PDB   SwissProt   PIR   PRF  released in the last 30 days      Nucleotide sequence databases    nr  All GenBank   EMBL   DDBJ   PDB sequences  but no EST  STS  GSS  or phase 0  1  or 2 HTGS sequences   No longer  non redundant  due to computational cost     refseq_rna  MRNA sequences from NCBI Reference Sequence Project   refseq_genomi
37.   All  the conflict annotations are preserved  and in the sequence   s history  you will find a reference to  the original contig  As long as you also save the original contig  you will always be able to go back  to it by choosing the Reference contig in the consensus sequence   s history  see figure 2 22      User  smoensted l  Parameters        Comments  Edit  No Comment    Originates from       Reference contig  history                                                           ace     Es     oz g M y    Figure 2 22  The history of the consensus sequence  which has been extracted from the contig   Clicking the blue text  Reference contig  will find and highlight the name of the saved contig in the  Navigation Area  Clicking the blue text  history  to the right will open the history view of the earlier  contig  From there  you can choose other views  such as the Read mapping view  of the contig        CHAPTER 2  TUTORIALS 92    2 6 Tutorial  In silico cloning cloning work flow    In this tutorial  the goal is to virtually PCR amplify a gene using primers with restriction sites at  the 5    ends  and insert the gene into a multiple cloning site of an expression vector  We start  off with a set of primers  a DNA template sequence and an expression vector loaded into the  Workbench     This tutorial will guide you through the following steps     1  Adding restriction sites to the primers  2  Simulating the effect of PCR by creating the fragment to use for cloning     3  Specify
38.   Assemble   sequences  291   to existing contig  295   to reference sequence  293  Assembly  376   tutorial  45   variance table  303  Atomic composition  217  attB sites  add  319  Audit  105    Backup  123  Base pairs  required for mispriming  257  Batch edit element properties  84  Batch processing  133  log of  138  Bibliography  403  Binding site for primer  2 1  Bioinformatic data  export  122  formats  117  392  bl2seq  see Local BLAST  BLAST  377  against a local Database  1 8  against NCBI  175  contig  301  create database from file system  187  create database from Navigation Area  187  create local database  18   database file format  395  database management  188  graphics output  182  list of databases  386  parameters  1 6  search  174  1 75  sequencing data  assembled  301  specify server URL  109  table output  183  tips for specialized searches  64  tutorial  61  64  URL  109  BLAST database index  187  BLAST DNA sequence  BLASTn  175  BLASTX  175  tBLASTx  175  BLAST Protein sequence    406    BLASTp  176   tBLASTn  176  BLAST result   search in  185  BLAST search   Bioinformatics explained  189  BLOSUM  scoring matrices  208  Bootstrap values  3 4  Borrow floating license  25  BP reaction  Gateway cloning  324  Broken pair coloring  298  Browser import sequence from  119  Bug reporting  28    C G content  145  CDS  translate to protein  149  Chain flexibility  146  Cheap end gaps  349  ChIP Seq analysis  3 0  Chromatogram traces  scale  2 8  cif  file for
39.   At the top is there a graphical representation  of BLAST hits with tool tips showing additional information on individual hits  Below is a tabular  form of the BLAST results                                            12 1 Running BLAST searches    With the CLC DNA Workbench there are two ways of performing BLAST searches  You can either  have the BLAST process run on NCBI   s BLAST servers  http    www ncbi nlm nih gov    or you can perform the BLAST search on your own computer     The advantage of running the BLAST search on NCBI servers is that you have readily access to  the popular  and often very large  BLAST databases without having to download them to your  own computer  The advantages of running BLAST on your own computer include that you can use  your own sequence collections as blast databases  and that running big batch BLAST jobs can  be faster and more reliable when done locally     CHAPTER 12  BLAST SEARCH 1 5    12 1 1 BLAST at NCBI    When running a BLAST search at the NCBI  the Workbench sends the sequences you select to  the NCBI   s BLAST servers  When the results are ready  they will be automatically downloaded  and displayed in the Workbench  When you enter a large number of sequences for searching with  BLAST  the Workbench automatically splits the sequences up into smaller subsets and sends  one subset at the time to NCBI  This is to avoid exceeding any internal limits the NCBI places on  the number of sequences that can be submitted to them for BLAST
40.   CHAPTER 12  BLAST SEARCH 183      Color box  For Line and Bar plots  the color of the plot can be set by clicking the  color box  If a Color bar is chosen  the color box is replaced by a gradient color  box as described under Foreground color     The remaining View preferences for BLAST Graphics are the same as those of alignments  See  section 19 2     Some of the information available in the tooltips is     e Name of sequence  Here is shown some additional information of the sequence which  was found  This line corresponds to the description line in GenBank  if the search was  conducted on the nr database      e Score  This shows the bit score of the local alignment generated through the BLAST search     e Expect  Also known as the E value  A low value indicates a homologous sequence  Higher  E values indicate that BLAST found a less homologous sequence     e Identities  This number shows the number of identical residues or nucleotides in the  obtained alignment     e Gaps  This number shows whether the alignment has gaps or not     e Strand  This is only valid for nucleotide sequences and show the direction of the aligned  strands  Minus indicate a complementary strand     e Query  This is the sequence  or part of the sequence  which you have used for the BLAST  search     e Sbjct  subject   This is the sequence found in the database     The numbers of the query and subject sequences refer to the sequence positions in the submitted  and found sequences  If the subject se
41.   Figure 16 5  Detailed information mode       The number of information line groups reflects the chosen length interval for primers and probes   One group is shown for every possible primer length  Within each group  a line is shown for every  primer property that is selected from the checkboxes in the primer information preference group   Primer properties are shown at each potential primer starting position and are of two types     Properties with numerical values are represented by bar plots  A green bar represents the starting  point of a primer that meets the set requirement and a red bar represents the starting point of a  primer that fails to meet the set requirement     e G C content   e Melting temperature   e Self annealing score   e Self end annealing score    Secondary structure score    Properties with Yes   No values  If a primer meets the set requirement a green circle will be  shown at its starting position and if it fails to meet the requirement a red dot is shown at its  starting position     e C G at 3    end  e C G at 5    end    Common to both sorts of properties is that mouse clicking an information point  filled circle or  bar  will cause the region covered by the associated primer to be selected on the sequence     16 4 Output from primer design    The output generated by the primer design algorithm is a table of proposed primers or primer  pairs with the accompanying information  see figure 16 6      CHAPTER 16  PRIMERS 256    E pcDNA3 atp8al    O   
42.   If you wish    CHAPTER 5  USER PREFERENCES AND SETTINGS 111      Sequence layout  CI Spaces every 10 residues      No wrap      Suto wrap  O Fixed wrap     10000       Double stranded    Numbers on sequences    Relative to    Numbers on plus strand    Follow selection    Lock labels  Sequence label    Mame t    k Annotation layout    k Annotation types  k Restriction sites  k Residue coloring  k Nucleotide info    k Find       k Text Format    Figure 5 9  The Sequence layout is expanded     ee ee ga       k    r      Figure 5 10  At the top of the Side Panel you can  Expand all groups  Collapse all preferences   Dock Undock preferences  Help  and Save Restore preferences     to change which settings should be used per default  open the Preferences dialog  see  section 5 2      e Delete Settings  Opens a dialog to select which of the saved settings to delete     Apply Saved Settings  This is a submenu containing the settings that you have previously  saved  By clicking one of the settings  they will be applied to the current view  You will  also see a number of pre defined view settings in this submenu  They are meant to be  examples of how to use the Side Panel and provide quick ways of adjusting the view to  common usages  At the bottom of the list of settings you will see CLC Standard Settings  which represent the way the program was set up  when you first launched it        r  q Save Settings as      Please enter a name for these user settings  my settings v         Alwa
43.   Importing parts of the database    Instead of importing the whole database automatically  you can export parts of the database  from Vector NTI Explorer and subsequently import into the Workbench  First  export a selection    CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 121    SS vector NTI Data  aa Proteins  EE Nucleotide  OE ADCY     Hx Adenoz  DOC ADRALA  j   Hx BaculoDirect Linear DMA  i cem 2 BaculoDirect Linear DNA Clonir  3 an e BOY  sn E BER OF  j an e CDE   ie Col 1  Figure 7 4  The Vector NTI Data folder containing all imported sequences of the Vector NTI  Database     of files as an archive as shown in figure 7 5     Exploring   Local Vector NTI Database  DNA RNA   Edit View Analyses Align Database Assemble Tools Help    Ge O   Ea      ta       Order              Open ase DNA RNA Molecules  Edit      Linear Basic NCBI Entrez NCBI E  New    2 35937 Linear Basic NCBI Entrez NCBIE  Import       2306 Linear Basic NCBI Entrez NCBI E  Molecule into Text file    Linear Basic Invitrogen Invitro  Gateway cloning   Sequence into Tert file    es en cs  Evo TOPO wizard 5 Circular Basic NCBI Entrez NCBI E    Selection into Archive    Linear Basic NCBI Entrez NCBIE  Delete with Descendants from DB 22260 Linear Basic NCBI Entrez NCBIE  6 Circular Basic NCBI Entrez MCBI E  Exclude from Subset Linear Basic NCBI Entrez NCBIE   gt  Delete from Database Linear   Basic NCBI Entrez NCBIE  Linear Basic NCBI Entrez NCBI E       Baws m m m    Figure   5  Select the relevant files and ex
44.   Navigation Area    Toolbox in the Menu Bar   Primers and Probes  E1    Analyze  Primer Properties           CHAPTER 16  PRIMERS 2 0    cd Calculation parameters    Chosen parameters  Maximum primer length  Minimum primer length  Maximum G C content  Minimum G C content  Maximum melting temperature  Minimum melting temperature  Maximum self annealing  Maximum self end annealing  Maximum secondary structure  3 end must meet G C requirements  5 end must meet G C requirements   Probe parameters  Minimum number of mismatches  1    Minimum number of mismatches in central part  1   Primer combination parameters  Max percentage point difference in G C content  Max difference in melting temperatures within a primer pair  Max hydrogen bonds between pairs  Max hydrogen bonds between pair ends  Minimum difference in melting temperature Primers Probes    Maximum length of amplicon    wf Calculate   Help    Figure 16 14  Calculation dialog shown when designing alignment based TaqMan probes        If a sequence was selected before choosing the Toolbox action  this sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove a sequence from  the selected elements     Clicking Next generates the dialog seen in figure 16 15        f  BB Analyze Primer Properties Ea       1  You can only select one repare ES  single sequence     2  Set parameters    Concentrations  Primer concentration  nM  200  gt     Salt concentration  mM  100  lt     Template  5  
45.   Note that the number of mismatches is reported in the output  so you will be able to filter on this  afterwards  see below    Below the match settings  you can adjust Concentrations concerning the reaction mixture  This    is used when reporting melting temperatures for the primers     e Primer concentration  Specifies the concentration of primers and probes in units of  nanomoles  nM     e Salt concentration  Specifies the concentration of monovalent cations   N 47    A   and  equivalents  in units of millimoles  mM     16 11 2 Results   binding sites and fragments  Click Next to specify the output options as shown in figure 16 18   The output options are     e Add binding site annotations  This will add annotations to the input sequences  see details  below      CHAPTER 16  PRIMERS 2 3    Find Binding Sites and Create Fragments      Select nucleotide     RESUR TES  sequences  to match  primer against      Set Primer properties Output format     Result handling Add binding site annotations  Create binding site table  Create fragment table       Min  Fragment length   100         Max  fragment length   2 000  gt     Result handling      Open     Save    Log handling           Figure 16 18  Output options include reporting of binding sites and fragments        e Create binding site table  Creates a table of all binding sites  Described in details below     e Create fragment table  Showing a table of all fragments that could result from using the  primers  Note that you can s
46.   Primer   3    GGTGGGAGGTCTATATAA  CCACCCTCCAGATATATT  dangler 3    Template   5  dangler      q      Previous       gt  Next   XX Cancel    Figure 16 15  The parameters for analyzing primer properties                 In the Concentrations panel a number of parameters can be specified concerning the reaction  mixture and which influence melting temperatures    e Primer concentration  Specifies the concentration of primers and probes in units of  nanomoles  nM     e Salt concentration  Specifies the concentration of monovalent cations   N 47    K   and  equivalents  in units of millimoles  mM     CHAPTER 16  PRIMERS 2 1    In the Template panel the sequences of the chosen primer and the template sequence are shown   The template sequence is as default set to the reverse complement of the primer sequence i e   as perfectly base pairing  However  it is possible to edit the template to introduce mismatches  which may affect the melting temperature  At each side of the template sequence a text field is  Shown  Here  the dangling ends of the template sequence can be specified  These may have an  important affect on the melting temperature  Bommarito et al   2000     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   The result is shown in figure 16 16     Fez Primer proper              Primer Table settme    Rows  1 Primer properties For sequence Primer Fite  O    g a    L       oo     Sequence     Melt      Self annealing alignmen
47.   Selection Mode  Show hide Side Panel  Sort folder   Split Horizontally   Split Vertically   Undo   User Preferences   Zoom In Mode   Zoom In  without clicking   Zoom Out Mode   Zoom Out  without clicking   Inverse zoom mode    Windows Linux  Shift   arrow keys  Ctrl   tab   Ctrl   W   Ctrl   Shift   W  Ctrl C   Ctrl   X   Delete   Alt   F4   Ctrl   E   Ctrl   G   Space or    F1   Ctrl       Ctrl   M   Ctrl   arrow keys  arrow keys   Ctrl   Shift   N  Ctrl   N   Ctrl   O   Ctrl   V   Ctrl   P   Ctrl   Y   F2   Ctrl   S   Ctrl   F   Ctrl   Shift   F  Ctrl   B   Ctrl   Shift   U  Ctri   A   Ctrl   2   Ctrl   U   Ctrl   Shift   R  Ctrl   T   Ctrl   J   Ctrl   Z   Ctrl   K   Ctrl      plus      plus    Ctrl      minus      minus     press and hold Shift    96    Mac OS X   Shift   arrow keys  Ctrl   Page Up Down  ao  W   a6   Shift   W   a  C   a  X   Delete or     Backspace  db  Q   ao  E   a  G   Space or    F1   ao           M   db   arrow keys  arrow keys     Shift   N   N    O    V    P     Y     S    F     Shift   F   B     Shift   U   A    2    U     Shift   R   7     J    Z        3      plus     d  4    SE SESI LILLE NIE IEEE       minus     press and hold Shift    Combinations of keys and mouse movements are listed below     tOn Linux changing tabs is accomplished using Ctrl   Page Up Page Down    CHAPTER 3  USER INTERFACE 97    Action Windows Linux Mac OS X Mouse movement    Maximize View Double click the tab of the View  Restore View Double click the View title  EL  
48.   The protein alignment as it looks when you open it with background color according to  the Rasmol color scheme and automatically wrapped     Now  we are going to modify how this alignment is displayed  For this  we use the settings in  the Side Panel to the right  All the settings are organized into groups  which can be expanded    collapsed by clicking the name of the group  The first group is Sequence Layout which is  expanded by default     First  select No wrap in the Sequence Layout  This means that each sequence in the alignment  is kept on the same line  To see more of the alignment  you now have to scroll horizontally     Next  expand the Annotation Layout group and select Show Annotations  Set the Offset to   More offset  and set the Label to  Stacked      Expand the Annotation Types group  Here you will see a list of the types annotation that are  carried by the sequences in the alignment  see figure 2 6      Check the  Region  annotation type  and you will see the regions as red annotations on the  sequences     Next  we will change the way the residues are colored  Click the Alignment Info group and under  Conservation  check  Background color   This will use a gradient as background color for the  residues  You can adjust the coloring by dragging the small arrows above the color box     2 3 1 Saving the settings in the Side Panel  Now the alignment should look similar to figure 2 7     At this point  if you just close the view  the changes made to the Side Pane
49.   This opens a dialog where you can alter your choice of sequences which you want to create  Statistics for  You can also add sequence lists     Note  You cannot create statistics for DNA and protein sequences at the same time   When the sequences are selected  click Next   This opens the dialog displayed in figure 13 15   The dialog offers to adjust the following parameters   e Individual statistics layout  If more sequences were selected in Step 1  this function  generates separate statistics for each sequence     e Comparative statistics layout  If more sequences were selected in Step 1  this function  generates statistics with comparisons between the sequences     CHAPTER 13  GENERAL SEQUENCE ANALYSES 213       a  g Create Sequence Statistics 8     1  Select sequences of same  SEE pa amete  type  2  Set parameters       Layout  Individual statistics layout       Comparative statistics layout    Background distribution  For proteins      Include background distribution of amino acids    Based on    Homo Sapiens  human     JCS    etreias    pue JU Jeh    Xena      Figure 13 15  Setting parameters for the sequence statistics                       You can also choose to include Background distribution of amino acids  If this box is ticked  an  extra column with amino acid distribution of the chosen species  is included in the table output    The distributions are calculated from UniProt www uniprot org version 6 0  dated September  13 2005      Click Next if you wish to adjus
50.   We use the    ATPase protein alignment    located in    Protein orthologs    in the Example data  To  create a phylogenetic tree     click the    ATPase protein alignment    in the Navigation Area   Toolbox   Alignments    and Trees        Create Tree  E      A dialog opens where you can confirm your selection of the alignment  Click Next to move to  the next step in the dialog where you can choose between the neighbor joining and the UPGMA  algorithms for making trees  You also have the option of including a bootstrap analysis of the  result  Leave the parameters at their default  and click Finish to start the calculation  which can  be seen in the Toolbox under the Processes tab  After a short while a tree appears in the View  Area  figure 2 55      Te  protein align                   x    P68053   Tree Settings    P  6so046    ill    k T r             Tree Layout    Node symbol  Layout Standard    P68345    Show internal node labels  P68063   E Label color       Branch label color        Mode color  EE Line color      Annotation Layout    Branches Bootstrap       k Text Format       Figure 2 55  After choosing which algorithm should be used  the tree appears in the View Area   The Side panel in the right side of the view allows you to adjust the way the tree is displayed     2 11 1 Tree layout  Using the Side Panel  in the right side of the view   you can change the way the tree is displayed     Click Tree Layout and open the Layout drop down menu  Here you can choose be
51.   ax JEPAC 3261 Linear Basic NCBI Entrez NCBI   ac  FYN 2647 Linear Basic NCBI Entrez NCBI   ae  GNAT1 3367 Linear Basic NCBI Entrez NCBI     ram mm  343 DNA RNA molecules          Figure 7 2  Data stored in the Vector NTI Local Database accessed through Vector NTI Explorer     File   Import Vector NTI Database    Edit Search View Toolbox Workspace Help    g Show Ctrl  0  Extract Sequences  New    Show     C  Close Ctrl W      Close Tab Area  Close All Views Ctrl Shift W  Close Other Tabs  Save Ctrl  S   E  Save As    Ctrl Shift S   ES Import    Ctrl  I   ES Import VectorNTI Data      c8 Export    Ctrl  E  Export with Dependent Elements     Export Graphics    Ctrl  G  Location b   P Page Setup       amp  Print    Ctrl P   S  Exit Alt F4    Figure 7 3  Import the whole Vector NTI Database     This will bring up a dialog letting you choose to import from the default location of the database   or you can specify another location  If the database is installed in the default folder  like e g   C  VNTI Database  press Yes  If not  click No and specify the database folder manually     When the import has finished  the data will be listed in the Navigation Area of the Workbench as  shown in figure 7 4     If something goes wrong during the import process  please report the problem to sup   port clcbio com  To circumvent the problem  see the following section on how to import  parts of the database  It will take a few more steps  but you will most likely be able to import  this way   
52.   i e   chromosome  coordinates  If you export including gaps  the data points in the file no longer  corresponds to the reference coordinates  because each gap will shift the coordinates     Clicking Next will present a file dialog letting you specify name and location for the file     The output format of the file is like this     CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 130     Position    Valice         LS   EM Oo ue  corto  ee a    7 5 Copy paste view output    The content of tables  e g  in reports  folder lists  and sequence lists can be copy pasted into  different programs  where it can be edited  CLC DNA Workbench pastes the data in tabulator  separated format which is useful if you use programs like Microsoft Word and Excel  There is a  huge number of programs in which the copy paste can be applied  For simplicity  we include one  example of the copy paste function from a Folder Content view to Microsoft Excel     First step is to select the desired elements in the view     click a line in the Folder Content view   hold Shift button   press arrow down up  key    See figure   16     L3 Sequences        Contents of  Sequences Filter     Name Description Length  AY738615 Homo sapiens hemoglobin delta beta Fusion protein  HBD HBB  gene      180  HUMDINUC Human dinucleotide repeat polymorphism at the D115439 and HBB loci   190   HUMHBB Human beta globin region on chromosome   NH 000044 Homo sapiens androgen receptor  dihydrotestosterone receptor testi     4314  IPER
53.   score   1567 8 bits  4050   Expect   0E00  TFOZATOBA M identities   779 1144  68   Positives   933 1144  82   Gaps   29 1144  2     3920 478B1_ HUMAN SS e              2G3  AT11E HUMAN     m         T       B196 AT114 HUMAN    _ lt    _       1111        lt 1 1  IB4S AT11C HUMAN  CA23JATOBS HUMAN  0423 478B3_ HUMAN  J1TO ATPSA HUMAN    BOC dA TAAAC LITIR  A RI  4 es     PRSE NE     Figure 12 8  Default display of the output of a BLAST search for one query sequence  At the top  is there a graphical representation of BLAST hits with tool tips showing additional information on  individual hits          Mukti BLAST           E     sp Q9NTIZIATEA  HUMAN Probable phospholipicttransporting ATPase IB  ATPase class   2  ML 1                                   Rows  6 Filter     B s g      Column width F  Query Number of hits Lowest E value Accession  E value  Automatic w     094296 101 0 00 NP_596486 up cmn  P39524 101 0 00 P39524  V  Query  p57792 101 0 00 NP_173938 TF  Number of hits  029449 113 0 00 NP 777263  a  QONTI2 111 0 00 NP 057613    Lowest E value    Q95 33 102 0 00 NP 177038   Accession  E value     Description  E value     Greatest identity      Accession  identity       Description  identity       Greatest positive      Accession  positive       Description  positive     Open BLAST Output Open Query Sequence   Greatest hit length      Accession  hit length               Figure 12 9  An overview BLAST table summarizing the results for a number of query sequences     In 
54.   sequence lists and the cloning editor and choose Digest All Sequences with Selected  Enzymes and Run on Gel    Note  When using the right click options  the sequence will be digested with the enzymes  that are selected in the Side Panel  This is explained in section 10 1 2     The view of the gel is explained in section 18 4 3    18 4 2 Separate sequences on gel    To separate sequences without restriction enzyme digestion  first create a sequence list of the  sequences in question  see section 10 7   Then click the Gel button  EE  at the bottom of the  view of the sequence list     For more information about the view of the gel  see the next section     18 4 3 Gel view    In figure 18 49 you can see a simulation of a gel with its Side Panel to the right  This view will  be explained in this section     CHAPTER 18  CLONING AND CUTTING 342    Separated sequences    HUMDINUC  pBR322  HUMHBB    Figure 18 48  A sequence list shown as a gel                  Restriction nv                a   el Setting           i        T   2  co x x oO        45   5   p   2 q q D    Gel options   v I I I I       ia a  14     14  Joel background  wi W wi Ww wi     2 oa oa oa oa Wiser    Scale band spread E  Show marker ladder   3  5  10  20  50  200          2  Sequences in separate lanes       All sequences in one lane    b Text Format     lt        RASA  Figure 18 49  Five lanes showing fragments of five sequences cut with restriction enzymes     Information on bands   fragments    You can get
55.   the GenBank format  You can add as many qualifier key lines as you wish by clicking the  button  Redundant lines can be removed by clicking the delete icon  4   The information  entered on these lines is shown in the annotation table  See section 10 3 1  and in the  yellow box which appears when you place the mouse cursor on the annotation  If you write  a hyperlink in the Key text field  like e g   www clcbio com     it will be recognized as a  hyperlink  Clicking the link in the annotation table will open a web browser     Click OK to add the annotation     Note  The annotation will be included if you export the sequence in GenBank  Swiss Prot or CLC  format  When exporting in other formats  annotations are not preserved in the exported file     10 3 3 Edit annotations  To edit an existing annotation from within a sequence view   right click the annotation   Edit Annotation         This will show the same dialog as in figure 10 10  with the exception that some of the fields are  filled out depending on how much information the annotation contains     There is another way of quickly editing annotations which is particularly useful when you wish to  edit several annotations     CHAPTER 10  VIEWING AND EDITING SEQUENCES 159    To edit the information  simply double click and you will be able to edit e g  the name or the  annotation type  If you wish to edit the qualifiers and double click in this column  you will see the  dialog for editing annotations     Advanced editing o
56.   the color box is replaced by a gradient color box as  described under Foreground color     e G C content  Calculates the G C content of a part of the sequence and shows it as a  gradient of colors or as a graph below the sequence         Window length  Determines the length of the part of the sequence to calculate  A  window length of 9 will calculate the G C content for the nucleotide in question plus  the 4 nucleotides to the left and the 4 nucleotides to the right  A narrow window will  focus on small fluctuations in the G C content level  whereas a wider window will show  fluctuations between larger parts of the sequence         Foreground color  Colors the letter using a gradient  where the left side color is used  for low levels of G C content and the right side color is used for high levels of G C  content  The sliders just above the gradient color box can be dragged to highlight  relevant levels of G C content  The colors can be changed by clicking the box  This  will show a list of gradients to choose from     CHAPTER 10  VIEWING AND EDITING SEQUENCES 146        Background color  Sets a background color of the residues using a gradient in the  same way as described above         Graph  The G C content level is displayed on a graph  Learn how to export the data  behind the graph in section   4    x Height  Specifies the height of the graph   x Type  The graph can be displayed as Line plot  Bar plot or as a Color bar     Color box  For Line and Bar plots  the color o
57.  13 are included in the initial seeding     After initial finding of words  seeding   the BLAST algorithm will extend the  only 3 residues  long  alignment in both directions  see figure 12 17   Each time the alignment is extended  an  alignment score is increases decreased  When the alignment score drops below a predefined  threshold  the extension of the alignment stops  This ensures that the alignment is not extended  to regions where only very poor alignment between the query and hit sequence is possible  If  the obtained alignment receives a score above a certain threshold  it will be included in the final  BLAST result     By tweaking the word size W and the neighborhood word threshold T  it is possible to limit the  search space  E g  by increasing T  the number of neighboring words will drop and thus limit the  search space as shown in figure 12 18     This will increase the speed of BLAST significantly but may result in loss of sensitivity  Increasing  the word size W will also increase the speed but again with a loss of sensitivity     CHAPTER 12  BLAST SEARCH 192             Query  325 SLAALLNKCKTPOGQRLVNQWIKOPLMDKNRIEERLNLVEA 365   LA  L  TP G R    W  P  D   ER   A  Sbjct  290 TLASVLDCTVTPMGSRMLKRWLHMPVRDTRVLLERQQTIGA 330    Figure 12 17  Blast aligning in both directions  The initial word match is marked green     N  D  e  G T 12  m    og  O  dp  Sequence 1  N   ab   O  D  5 T 16  O  0   N   Sequence 1    Figure 12 18  Each dot represents a word match  Increasing
58.  2 8  Quality of trace  289   Quality score of trace  289   Quality scores  145   Quick start  29    Rasmol colors  144  Reading frame  234  Realign alignment  3 8  Reassemble contig  304  Rebase  restriction enzyme database  343  Rebuild index  103  Recognition sequence  insert  318  Recycle Bin  83  Redo alignment  350    412    Redo Undo  87  Reference sequence  3 6  References  403  Region  types  149  Remove  annotations  160  sequences from alignment  358  terminated processes  93  Rename element  83  Report program errors  28  Report  protein  377  Request new feature  28  Residue coloring  144  Restore  deleted elements  83  size of view  89  Restriction enzmyes  filter  330  332  336  344  from certain suppliers  330  332  336  344  Restriction enzyme list  343  Restriction enzyme  star activity  343  Restriction enzymes  327  compatible ends  334  cutting selection  331  isoschizomers  334  methylation  330  332  336  344  number of cut sites  329  overhang  330  332  336  344  separate on gel  341  sorting  329  Restriction sites  327  3 8  enzyme database Rebase  343  select fragment  149  number of  337  on sequence  143  327  parameters  335  tutorial  72  Results handling  136  Reverse complement  231  3 8  Reverse complement contig  297  Reverse sequence  232  Reverse translation  243  378  Bioinformatics explained  245  Right click on Mac  34  RNA secondary structure  3 9  RNA translation  232  RNA Seq analysis  376    INDEX     rnaml  file format  395    Saf
59.  3 2 3 Close views  When a view is closed  the View Area remains open as long as there is at least one open view   A view is closed by   right click the tab of the View   Close  or select the view   Ctrl   W    or hold down the Ctrl button   Click the tab of the view while the button is pressed    CHAPTER 3  USER INTERFACE 87    By right clicking a tab  the following close options exist  See figure 3 10    a  P68046 O   aet P68053Q   agt Poasa O  at  P      File       k  k  view k  k  k    HBE    Toolbox  Show  PF68225 MVHLTPEEKNAVTTLWG D Close erly    B Close Tab Area    HBB TE Close all views  Ctrl 5hift w  E Reid    Pee225 ESFGDLSSPDAVMGNPK ILDNL   S  save as Ctrl Shift  5       Figure 3 10  By right clicking a tab  several close options are available     e Close  See above   e Close Tab Area  Closes all tabs in the tab area     Close All Views  Closes all tabs  in all tab areas  Leaves an empty workspace     Close Other Tabs  Closes all other tabs  in all tab areas  except the one that is selected     3 2 4 Save changes in a view    When changes are made in a view  the text on the tab appears bold and italic  on Mac it is  indicated by an   before the name of the tab   This indicates that the changes are not saved   The Save function may be activated in two ways     Click the tab of the view you want to save   Save  HD in the toolbar   or Click the tab of the view you want to save   Ctrl   S  38   S on Mac     If you close a view containing an element that has been change
60.  AND SETTINGS 109    5 4 Advanced preferences    The Advanced settings include the possibility to set up a proxy server  This is described in  section 1 8     5 4 1 Default data location    If you have more than one location in the Navigation Area  you can choose which location should  be the default data location  The default location is used when you e g  import a file without  selecting a folder or element in the Navigation Area first  Then the imported element will be  placed in the default location     Note  The default location cannot be removed  You have to select another location as default  first     5 4 2 NCBI BLAST  URL to use for BLAST    It is possible to specify an alternate server URL to use for BLAST searches  The standard URL  for the BLAST server at NCBI is  http   blast ncbi nlm nih gov Blast cgi     Note  Be careful to specify a valid URL  otherwise BLAST will not work     5 5 Export import of preferences    The user preferences of the CLC DNA Workbench can be exported to other users of the program   allowing other users to display data with the same preferences as yours  You can also use the  export import preferences function to backup your preferences     To export preferences  open the Preferences dialog  Ctrl   K  46     on Mac   and do the following     Export   Select the relevant preferences   Export   Choose location for the exported  file   Enter name of file   Save    Note  The format of exported preferences is  cpf  This notation must be submit
61.  BLOSUM  In 1992  14 years after the PAM matrices were published  the BLOSUM matrices  BLOcks  SUbstitution Matrix  were developed and published  Henikoff and Henikoff  1992      Henikoff et al  wanted to model more divergent proteins  thus they used locally aligned  sequences where none of the aligned sequences share less than 62  identity  This resulted    CHAPTER 13  GENERAL SEQUENCE ANALYSES 210         I em i           E foo to Mo ot Mo Mo Bo Bo E E Mo 4  WONFNWARPRPWNHWNHNWNHDND BBW W    ONWORRNEKBRBBNOKBKRONNE AD  ONWRPERNWENNWONOKRWBNOUERD  WONRORNWNHNOWWKRODOOWKOAONZ  WOWKRRPORWBHWEKWERNOWOAKRNNDT  PONBPBRwWBNHRWOKRKRWBWBAWOWWDWOO  MD EPNRPORWOKNWONNOUWOOKKO  MUNWRPORWNKWWONUNAKANOOHRM   M  WWNHNONWWNKAKRNANNWRONOD  ONNNEBNEBNRWBWANDOOWRKRON IT  WR WRNWORWNAWEAWWHRWWWK      PRPENOBEBNWBONNANWBEKAWNHKRAWNEOC  MNWBRORWKRUNWENKBKBRWBRONB SR  RABAANOGANAENANORONHHAS  PWKRNYONKHMAOWDDOORWBWWNWWWNHT  NONRNBPANANVNAGONNARORNNHTO  MNWRPARNRONNKRODOKROKBBKBE YD  ONNORBENKBRBREBKBNNKBRKBBRBROBR O 4  PNNNNWBWENEPBRNWBNHKENWNND  lt   RE WBONNBRKRENBRWWWNHNKRWWWOK lt      lt  lt S 70 707 2Z2AHRTF 7TOAMODV2Z2AD YS    Table 13 1  The BLOSUM62 matrix  A tabular view of the BLOSUM62 matrix containing all  possible substitution scores  Henikoff and Henikoff  1992      in a scoring matrix  called BLOSUM62  In contrast to the PAM matrices the BLOSUM matrices  are calculated from alignments without gaps emerging from the BLOCKS database http   Fi DCCs  Tiers org       Sean Eddy recently 
62.  HH sequence list       Accession Definition Modification Date Length  P maniculatus  dee     27 APR 1953 110    PERHIBE M15289 Pimaniculatus  dee     27 APR 1993  PERHZBA M15293 Pimaniculatus  dee     27 APR 1993    PERHZEE M15290 Pimaniculatus  dee     27 APR 1993  M15291 Pimaniculatus  dee     27 APR 1993       Figure 10 16  A sequence list containing multiple sequences can be viewed in either a table or in  a graphical sequence list  The graphical view is useful for viewing annotations and the sequence  itself  while the table view provides other information like sequence lengths  and the number of  sequences in the list  number of Rows reported      10 7 1 Graphical view of sequence lists    The graphical view of sequence lists is almost identical to the view of single sequences  see  section 10 1   The main difference is that you now can see more than one sequence in the same  view     However  you also have a few extra options for sorting  deleting and adding sequences     e To add extra sequences to the list  right click an empty  white  space in the view  and select  Add Sequences     e To delete a sequence from the list  right click the sequence   s name and select Delete  Sequence     e To sort the sequences in the list  right click the name of one of the sequences and select  Sort Sequence List by Name or Sort Sequence List by Length     e To rename a sequence  right click the name of the sequence and select Rename Sequence     10 7 2 Sequence list table    Each s
63.  Hence     click Zoom Out  5  in the Toolbar   click the sequence until you can see the whole  sequence    This sequence is circular  which is indicated by  lt  lt  and  gt  gt  at the beginning and the end of the  sequence     In the following we will show how the same sequence can be displayed in two different views    one linear view and one circular view  First  Zoom in to see the residues again by using the Zoom  In  40  or the 100     4    Then we make a split view by     press and hold the Ctrl button on the keyboard  38 on Mac    click Show as Circular       at the bottom of the view    This opens an additional view of the vector with a circular display  as can be seen in figure 2 4     act prONAS atp  al   S    pcONAS atp8al EGACGGAT CGGGAGATCTCCCGATCCCCTATGGICGACTCTCAGT    60 a0       pcDNA3 atpsal ACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATCTGCTCCC    O E   DE       pcDNA3 atpsal       Sal    Sal     Ampicillin ORF  ColE4 w S    pcDNA3 atp8ai    rsi 9118 bp    Smat   Neomycin ORF      SV40 origin of replicatio   V 40 promoter          Aho      Sal  BHG Poly A  Sp promoter      0 E   0 E    Figure 2 4  The resulting two views which are split horizontally        Make a selection on the circular sequence  remember to switch to the Selection h  tool in the  tool bar  and note that this selection is also reflected in the linear view above     CHAPTER 2  TUTORIALS 41    2 3 Tutorial  Side Panel Settings    This brief tutorial will show you how to use the Side Panel to change th
64.  If you have specified a set of enzymes which you always use  it will probably be a good idea to  save the settings in the Side Panel  see section 3 2 7  for future use     Show enzymes cutting inside  outside selection    Section 18 3 1 describes how to add more enzymes to the list in the Side Panel based on the  name of the enzyme  overhang  methylation sensitivity etc  However  you will often find yourself  in a situation where you need a more sophisticated and explorative approach     An illustrative example  you have a selection on a sequence  and you wish to find enzymes  cutting within the selection  but not outside  This problem often arises during design of cloning  experiments  In this case  you do not know the name of the enzyme  so you want the Workbench  to find the enzymes for you     CHAPTER 18  CLONING AND CUTTING 332    right click the selection   Show Enzymes Cutting Inside Outside Selection  HE     This will display the dialog shown in figure 18 35 where you can specify which enzymes should  initially be considered                    a  ke  Show Enzymes Cutting Inside Outside Selection Es  O  1  Enzymes to be considered     Enzym Es CO DE considered In Calculation  in calculation Enzyme list  J  Use existing enzyme list Popular enzymes X   o     Enzymes in  Popular en       Enzymes to be used  Filter  Filter   Name Overhang Methylation Popularity   Name Overhang Methylation Popula       BamHI 5   gate N4 methyl          a Smal Blunt   N4 meth            Bgl
65.  Import    button  Note that there is also another import button  at the very bottom of the dialog  but this will import the other settings of the Preferences dialog     see section 5 5      The dialog asks if you wish to overwrite existing Side Panel settings  or if you wish to merge the  imported settings into the existing ones  see figure 5 7      os      How do you want to import      Merge into existing styles    Overwrite existing styles     da    X Cancel        Figure 5 7  When you import settings  you are asked if you wish to overwrite existing settings or if  you wish to merge the new settings into the old ones              Note  If you choose to overwrite the existing settings  you will loose all the Side Panel settings  that you have previously saved     To avoid confusion of the different import and export options  here is an overview     e Import and export of bioinformatics data such as sequences  alignments etc   described  in section 7 1 1      e Graphics export of the views which creates image files in various formats  described in  section   3      e Import and export of Side Panel Settings as described above     e Import and export of all the Preferences except the Side Panel settings  This is described  in the previous section     5 3 Data preferences    The data preferences contain preferences related to interpretation of data  e g  linker sequences     e Predefined primer additions for Gateway cloning  See section 18 2 1      CHAPTER 5  USER PREFERENCES
66.  Import Export Description   CLC cle X X Rich format including all information  Clustal Alignment aln X X   GCG Alignment  msf X X   Nexus  NXs  Nexus X X   Phylip Alignment  phy X X   Zip export zip X Selected files in CLC format    Zip import  zip  gzip   tar X Contained files folder structure    G 1 4 Tree formats    File type Suffix Import Export Description   CLC cle X X Rich format including all information  Newick wk X X   Nexus  nXS  nexus X X   Zip export Zip X Selected files in CLC format    Zip import  Zip  gzip   tar X Contained files folder structure    APPENDIX G  FORMATS FOR IMPORT AND EXPORT    G 1 5 Miscellaneous formats    File type Suffix  BLAST Database   phr  nhr  CLC  clc   CSV  CSV  Excel xIs  xIsx  GFF  gff  mmcCIF cif   PDB  pdb   Tab delimited txt   Text txt   Zip export Zip   Zip import    395    Import Export Description    X  X     zip  gzip   tar X    Link to database imported   Rich format including all information  All tables   All tables and reports    See http   www clcbio com   annotate with gff    3D structure   3D structure   All tables   All data in a textual format  Selected files in CLC format    Contained files folder structure    Note  The Workbench can import    external    files  too  This means that all kinds of files can be  imported and displayed in the Navigation Area  but the above mentioned formats are the only    ones whose contents can be shown in the Workbench     G 2 List of graphics data formats    Below is a list of form
67.  In this dialog  you will be able to specify how this trimming should be performed     For this data  we wish to use a more stringent trimming  so we set the limit of the quality score  trim to 0 02  see figure 2 12         g Trim Sequences Es    m  1  Select nucleotide   set trim parameters  sequences    2  Set trim parameters    Sequence trimming       Ignore existing trim information      Trim using quality scores  Limit   0 02      Trim using ambiguous nucleotides    Residues  2    Vector trimming  Trim contamination from vectors in UniVec database  Trim contamination from saved sequences  to be chosen in the next step     limit    moderate  DA E  to  a   one    Figure 2 12  Specifying how sequences should be trimmed  A stringent trimming of 0 02 is used in  this example                       There is no vector contamination in these data  se we only trim for poor quality   If you place the mouse cursor on the parameters  you will see a brief explanation   Click Next and choose to Save the results     When the trimming is performed  the parts of the sequences that are trimmed are actually  annotated  not removed  see figure 2 13   By choosing Save  the Trim annotations will be saved  directly to the sequences  without opening them for you to view first     CHAPTER 2  TUTORIALS 46    Trim    CAGCACAGAGGTCATACTGGCATTCTGAACG    A Www WAV        lh    Figure 2 13  Trimming creates annotations on the regions that will be ignored in the assembly  process        These annotated 
68.  MAT       X PAD caso BGR awe  ss da A eS  gt   100  Fixed wrap     Conservation  0     6 80  l l  4  Numbers on sequences  Q29449             2 25   se ee ee ee uu    ee uu    E  O9NTIZ    Ss eect eee ee eee ee ere eee Ce las us Relative to 1  p39524 BBTTSHSGSR SKETNSHANG H  PPrPsH  Oe EETIDEDADO s Lock numbers  094296 BEREDRECSE sSoMESsSScoN STNP              BRAD 6 ae  PSII senao eu do Wa eed Gaon a dra a aca da ADE oat dl ae   es 1 A ide labels  Q95X33    2 222 eee eee ee eee Be ee Be eee 11  V  Lock labels  CONSEIA   lt s 8s S82 se See Sa Sees SSS SSeS SS   ns SS mia es Sequence label  100  N  Conservation ane x  04 Domed ie Dees eee annal em emo          Show selection boxes       E y Of  Figure 2 54  The resulting alignment     Note  The new alignment is not saved automatically   To save the alignment  drag the tab of the alignment view into the Navigation Area     Installing the Additional Alignments plugin gives you access to other alignment algorithms   ClustalW  Windows Mac Linux   Muscle  Windows Mac  Linux   T Coffee  Mac Linux   MAFFT   Mac Linux   and Kalign  Mac Linux   The Additional Alignments Module can be downloaded from  http   www clcbio com plugins  Note that you will need administrative privileges on your    CHAPTER 2  TUTORIALS 11    system to install it     2 11 Tutorial  Create and modify a phylogenetic tree    You can make a phylogenetic tree from an existing alignment   See how to create an alignment  in the tutorial   Align protein sequences     
69.  N   met         lt a   cg 5  S meth           tgca Hk    qc 5  S meth                 seo    Apal  Ball  Chal  FokI  Hhal  NsiI  Sacll    03 a a CM 0  0  0  0  0  6  w                   Figure 18 41  Selecting enzymes        If you need more detailed information and filtering of the enzymes  either place your mouse  cursor on an enzyme for one second to display additional information  see figure 18 52   or use  the view of enzyme lists  See 18 5      All enzymes                   Filter  3    Name Overh    Methyl    Pop       PstI 3 N6 meth     terre a  KpnI  3 N   meth     e    Sacl 3 S methyl    Pet     Sphl 3 ia  Apal 3 5 methyl     er     Sacll is S methyl     et                  Enzyme  Sacll  Recognition site pattern  CCGCGG  Suppliers  GE Healthcare  Qbiogene  American Allied Biochemical  Inc   Nippon Gene Co   Ltd        Takara Bio Inc    New England Biolabs  Toyobo Biochemicals  Molecular Biology Resources  Promega Corporation   EURx Ltd     Figure 18 42  Showing additional information about an enzyme like recognition sequence or a list  of commercial vendors     CHAPTER 18  CLONING AND CUTTING 337    Number of cut sites    Clicking Next confirms the list of enzymes which will be included in the analysis  and takes you  to the dialog shown in figure 18 43        4  g Restriction Site Analysis       1  Select DNA RNA   Number of cut sites  sequence s     Ex    2  Enzymes to be considered  in calculation  3  Number of cut sites    Display enzymes with     No restricti
70.  O          Create Alignment  HEE  Wl 5   mM             Create Alignment  Es   PMR RRRRRRRRRR RRR RRR A                Search Database  nucleotide  NC 012671   E  HRRRRRRRRRR RRR RRRRRRRRRR RRR  100                Figure 3 17  A database search and an alignment calculation are running  Clicking the small icon  next to the process allow you to stop  pause and resume processes     Besides the options to stop  pause and resume processes  there are some extra options for a  selected number of the tools running from the Toolbox     e Show results  If you have chosen to save the results  see section 9 2   you will be able to  open the results directly from the process by clicking this option     e Find results  If you have chosen to save the results  see section 9 2   you will be able to  high light the results in the Navigation Area     e Show Log Information  This will display a log file showing progress of the process  The  log file can also be shown by clicking Show Log in the  handle results  dialog where you  choose between saving and opening the results     e Show Messages  Some analyses will give you a message when processing your data   The messages are the black dialogs shown in the lower left corner of the Workbench that  disappear after a few seconds  You can reiterate the messages that have been shown by  clicking this option     The terminated processes can be removed by     View   Remove Terminated Processes  3C     If you close the program while there are running p
71.  The peak is called by changing the residue to an ambiguity character and by adding an annotation  at this position     To call secondary peaks     select sequence s    Toolbox in the Menu Bar   Sequencing Data Analyses  F        Call Secondary Peaks        This opens a dialog where you can alter your choice of sequences   When the sequences are selected  click Next   This opens the dialog displayed in figure 17 26     The following parameters can be adjusted in the dialog     e Percent of max peak height for calling  Adjust this value to specify how high the secondary  peak must be to be called     e Use IUPAC code   N for ambiguous nucleotides  When a secondary peak is called  the  residue at this position can either be replaced by an N or by a ambiguity character based  on the IUPAC codes  see section        CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 306       a  g Secondary Peak Calling    1  Select nucleotide      4    sequences with traces    2  Set parameters    Calling parameters    Percent of max peak height for calling  205     9  Use IUPAC code for ambiguous nucleotides         Use N for ambiguous nucleotides   V  Add annotations       II e                    Piet   Veh  Xe        Figure 17 26  Setting parameters secondary peak calling     e Add annotations  In addition to changing the actual sequence  annotations can be added  for each base which has been called     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click 
72.  This chapter first describes how to create and  second how to adjust the view of the plot     13 2 1 Create dot plots    A dot plot is a simple  yet intuitive way of comparing two sequences  either DNA or protein  and is  probably the oldest way of comparing two sequences  Maizel and Lenk  1981   A dot plot is a 2  dimensional matrix where each axis of the plot represents one sequence  By sliding a fixed size  window over the sequences and making a sequence match by a dot in the matrix  a diagonal line  will emerge if two identical  or very homologous  sequences are plotted against each other  Dot  plots can also be used to visually inspect sequences for direct or inverted repeats or regions with  low sequence complexity  Various smoothing algorithms can be applied to the dot plot calculation  to avoid noisy background of the plot  Moreover  can various substitution matrices be applied in  order to take the evolutionary distance of the two sequences into account     To create a dot plot   Toolbox   General Sequence Analyses  Ga    Create Dot Plot  4      or Select one or two sequences in the Navigation Area   Toolbox in the Menu Bar      General Sequence Analyses  GA    Create Dot Plot  4      CHAPTER 13  GENERAL SEQUENCE ANALYSES 202    or Select one or two sequences in the Navigation Area   right click in the Navigation  Area   Toolbox   General Sequence Analyses  9    Create Dot Plot  22     This opens the dialog shown in figure 13 3        g Create Dot Plot Es    m  1  S
73.  UniVec database can be found at http     www ncbi nlm nih gov VecScreen replist html         Hit limit  Specifies how strictly vector contamination is trimmed  Since vector  contamination usually occurs at the beginning or end of a sequence  different criteria  are applied for terminal and internal matches  A match is considered terminal  if it is located within the first 25 bases at either sequence end  Three match  categories are defined according to the expected frequency of an alignment with the  same score occurring between random sequences  The CLC DNA Workbench uses  the same settings as VecScreen  http    www ncbi nlm nih gov VecScreen   VecScreen html     x Weak  Expect 1 random match in 40 queries of length 350 kb    Terminal match with Score 16 to 18     Internal match with Score 23 to 24   x Moderate  Expect 1 random match in 1 000 queries of length 350 kb    Terminal match with Score 19 to 23     Internal match with Score 25 to 29   x Strong  Expect 1 random match in 1 000 000 queries of length 350 kb    Terminal match with Score  gt  24     Internal match with Score  gt  30   Note that selecting e g  Weak will also include matches in the Moderate and Strong  categories     e Trim contamination from saved sequences  This option lets you select your own vector  sequences that you know might be the cause of contamination  If you select this option   you will be able to select one or more sequences when you click Next     CHAPTER 17  SEQUENCING DATA ANALYSES AND AS
74.  Whereas the distance based methods compress all sequence  information into a single number  the character based methods attempt to infer the phylogeny  based on all the individual characters  nucleotides or amino acids      Parsimony  In parsimony based methods a number of sites are defined which are informative  about the topology of the tree  Based on these  the best topology is found by minimizing the  number of substitutions needed to explain the informative sites  Parsimony methods are not  based on explicit evolutionary models     Maximum Likelihood  Maximum likelinood and Bayesian methods  see below  are probabilistic  methods of inference  Both have the pleasing properties of using explicit models of molecular  evolution and allowing for rigorous statistical inference  However  both approaches are very  computer intensive     A stochastic model of molecular evolution is used to assign a probability  likelinood  to each  phylogeny  given the sequence data of the OTUs  Maximum likelihood inference  Felsenstein     CHAPTER 20  PHYLOGENETIC TREES 374    1981  then consists of finding the tree which assign the highest probability to the data     Bayesian inference  The objective of Bayesian phylogenetic inference is not to infer a single   correct  phylogeny  but rather to obtain the full posterior probability distribution of all possible  phylogenies  This is obtained by combining the likelihood and the prior probability distribution of  evolutionary parameters  The vast
75.  a list which contains the sequences present in the cloning editor  The inserted  sequence remains on the list of sequences  If the two sequences do not have blunt ends   the ends    overhangs have to match each other  Otherwise a warning is displayed     e Insert sequence before this sequence       Insert another sequence before this sequence  The sequence to be inserted can be  selected from a list which contains the sequences present in the cloning editor  The  inserted sequence remains on the list of sequences  If the two sequences do not have  blunt ends  the ends    overhangs have to match each other  Otherwise a warning is  displayed     e Reverse sequence  Reverse the sequence and replaces the original sequence in the list  This is sometimes  useful when working with single stranded sequences  Note that this is not the same as  creating the reverse complement  see the following item in the list      e Reverse complement sequence  3   Creates the reverse complement of a sequence and replaces the original sequence in the  list  This is useful if the vector and the insert sequences are not oriented the same way     e Digest Sequence with Selected Enzymes and Run on Gel  ES   See section 18 4 1    CHAPTER 18  CLONING AND CUTTING 315    e Rename sequence  Renames the sequence     e Select sequence  This will select the entire sequence     e Delete sequence   amp     This deletes the given sequence from the cloning editor     e Open copy of sequencew  4   This will open a c
76.  a selection on the negative strand   Open selection in New View  L       By doing that  the sequence will be reversed  This is only possible when the double stranded  view option is enabled  It is possible to copy the selection and paste it in a word processing  program or an e mail  To obtain a reverse complement of an entire sequence     select a sequence in the Navigation Area   Toolbox in the Menu Bar   Nucleotide  Analyses  GA    Reverse Complement  x     or right click a sequence in Navigation Area   Toolbox   Nucleotide Analyses  4     Reverse Complement  x     This opens the dialog displayed in figure 14 3        a  q Reverse Complement Sequence Es    lal  1  Select nucleotide   Eia ss ie ias  sequences Projects  Selected Elements   1    5  CLC_Data Xc ATPBai mRNA     Example Data  Xx ATP8al genomic sec    Cloning     Cloning vector liti  Enzyme lists  Xc pcDNA3 atp8al  xx pcDNA4_TO  Processed data                i  Cloning expe    gt          Primers SS     Protein analyses           Protein orthologs     RNA secondary strui  Sequencing data     Q     lt enter search term gt  A    previous   puet    Senh    Xema            Figure 14 3  Creating a reverse complement sequence     If a sequence was selected before choosing the Toolbox action  the sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     Click Next if you wish to adjust how to handle the results  see
77.  acid  The hydrophobicity score is then calculated as  the sum of the values in a    window     which is a particular range of the sequence  The window  length can be set from 5 to 25 residues  The wider the window  the less fluctuations in the  hydrophobicity scores   For more about the theory behind hydrophobicity  see 15 2 3       In the following we will focus on the different ways that CLC DNA Workbench offers to display  the hydrophobicity scores  We use Kyte Doolittle to explain the display of the scores  but the  different options are the same for all the scales  Initially there are three options for displaying  the hydrophobicity scores  You can choose one  two or all three options by selecting the boxes    See figure 15 6      Coloring the letters and their background  When choosing coloring of letters or coloring of  their background  the color red is used to indicate high scores of hydrophobicity  A    color slider     allows you to amplify the scores  thereby emphasizing areas with high  or low  blue  levels of  hydrophobicity  The color settings mentioned are default settings  By clicking the color bar just  below the color slider you get the option of changing color settings     Graphs along sequences  When selecting graphs  you choose to display the hydrophobicity  scores underneath the sequence  This can be done either by a line plot or bar plot  or by coloring     CHAPTER 15  PROTEIN ANALYSES 241    Atp8at    wky  ATP8a1 MPTMRRTVSEIRSRAEGYEKTDDVSEKTSLADQEEVR
78.  acid could translate into several different codons   only 20 amino acids but 64 different codons   Thus  the program offers a number of choices for  determining which codons should be used  These choices are explained in this section     In order to make a reverse translation           Reverse Translate        or right click a protein sequence   Toolbox   Protein Analyses  ha    Reverse translate     3A   This opens the dialog displayed in figure 15 8     If a sequence was selected before choosing the Toolbox action  the sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements  You can translate several protein sequences at a  time     Click Next to adjust the parameters for the translation     CHAPTER 15  PROTEIN ANALYSES 244       E  BB Reverse Translate E   1  Select protein sequences    Select protein sequences  Projects  Selected Elements   1        CLC_Data Ss so ATP Sal     Example Data             H  Cloning     Primers     Protein analyses     Protein ortholog     RNA secondary     Sequencing data              gt            e    T      Qy    lt enter search term gt             Previous Finish x Cancel             Figure 15 8  Choosing a protein sequence for reverse translation     15 3 1 Reverse translation parameters    Figure 15 9 shows the choices for making the translation        E  q Reverse Translate    1  Select protein sequences pet pal ametrers    2  Set pa
79.  acid distribution   e Histogram of amino acid distribution   e Annotation table   e Counts of di peptides    e Frequency of di peptides  The output of nucleotide sequence statistics include     e General statistics         Sequence type       Length       Organism     Name       Description       Modification Date      Weight  This is calculated like this  swimunitsinsequence wetght unit       links x  weight H20  where links is the sequence length minus one for linear sequences  and sequence length for circular molecules  The units are monophosphates  Both  the weight for single  and double stranded molecules are includes  The atomic  composition is defined the same way     CHAPTER 13  GENERAL SEQUENCE ANALYSES 215    e Atomic composition    Nucleotide distribution table    Nucleotide distribution histogram  e Annotation table    e Counts of di nucleotides    Frequency of di nucleotides    A short description of the different areas of the statistical output is given in section 13 4 1     13 4 1 Bioinformatics explained  Protein statistics    Every protein holds specific and individual features which are unique to that particular protein   Features such as isoelectric point or amino acid composition can reveal important information of  a novel protein  Many of the features described below are calculated in a simple way     Molecular weight    The molecular weight is the mass of a protein or molecule  The molecular weight is simply  calculated as the sum of the atomic mass of
80.  all the atoms in the molecule     The weight of a protein is usually represented in Daltons  Da      A calculation of the molecular weight of a protein does not usually include additional posttransla   tional modifications  For native and unknown proteins it tends to be difficult to assess whether  posttranslational modifications such as glycosylations are present on the protein  making a  calculation based solely on the amino acid sequence inaccurate  The molecular weight can be  determined very accurately by mass spectrometry in a laboratory     Isoelectric point    The isoelectric point  pl  of a protein is the pH where the proteins has no net charge  The pl is  calculated from the pKa values for 20 different amino acids  At a pH below the pl  the protein  carries a positive charge  whereas if the pH is above pl the proteins carry a negative charge  In  other words  pl is high for basic proteins and low for acidic proteins  This information can be  used in the laboratory when running electrophoretic gels  Here the proteins can be separated   based on their isoelectric point     Aliphatic index    The aliphatic index of a protein is a measure of the relative volume occupied by aliphatic side  chain of the following amino acids  alanine  valine  leucine and isoleucine  An increase in the  aliphatic index increases the thermostability of globular proteins  The index is calculated by the  following formula     Aliphaticindex   X  Ala   ax  X Val    6  X Leu    6     X Ile    
81.  an open source program and anyone can download and change the program code  This  has also given rise to a number of BLAST derivatives  WU BLAST is probably the most commonly  used  Altschul and Gish  1996      CHAPTER 12  BLAST SEARCH 190    BLAST is highly scalable and comes in a number of different computer platform configurations  which makes usage on both small desktop computers and large computer clusters possible     12 5 1 Examples of BLAST usage    BLAST can be used for a lot of different purposes  A few of them are mentioned below     e Looking for species  If you are sequencing DNA from unknown species  BLAST may help  identify the correct species or homologous species     e Looking for domains  If you BLAST a protein sequence  or a translated nucleotide sequence   BLAST will look for Known domains in the query sequence     e Looking at phylogeny  You can use the BLAST web pages to generate a phylogenetic tree  of the BLAST result     e Mapping DNA to a known chromosome  If you are sequencing a gene from a known  species but have no idea of the chromosome location  BLAST can help you  BLAST will  show you the position of the query sequence in relation to the hit sequences     e Annotations  BLAST can also be used to map annotations from one organism to another  or look for common genes in two related species     12 5 2 Searching for homology    Most research projects involving sequencing of either DNA or protein have a requirement for  obtaining biological informa
82.  analysis     16 7 1 TaqMan output table    In TaqMan mode there are two primers and a probe in a given solution  forward primer  F   reverse  primer  R  and a TaqMan probe  TP      The output table can show primer probe pair combination parameters for all three combinations  of primers and single primer parameters for both primers and the TaqMan probe  see section on    CHAPTER 16  PRIMERS 264    Standard PCR for an explanation of the available primer pair and single primer information      The fragment length in this mode refers to the length of the PCR fragment generated by the  primer pair  and this is also the PCR fragment which can be exported     16 8 Sequencing primers  This mode is used to design primers for DNA sequencing     In this mode the user can define a number of Forward primer regions and Reverse primer regions  where a sequencing primer can start  These are defined by making a selection on the sequence  and right clicking the selection  If areas are known where primers must not bind  e g  repeat rich  areas   one or more No primers here regions can be defined     No requirements are instated on the relative position of the regions defined     After exploring the available primers  See section 16 3  and setting the desired parameter values  in the Primer Parameters preference group  the Calculate button will activate the primer design  algorithm     After pressing the Calculate button a dialog will appear  see figure 16 11      q Calculation parameters    Ch
83.  any problems  please contact The CLC Support Team    Figure 1 11  Read the license agreement carefully              1 4 3 Import a license from a file    lf you are provided a license file instead of a license ID  you will be able to import the file using  this option     When you have clicked Next  you will see the dialog shown in 1 12        License Wizard xe      d CLC DNA Workbench    Import a license from a file       Please click the button below and locate the file containing your license        No file selected          Choose License File       If you experience any problems  please contact The CLC Support Team      Proxy Settings Previous Next   Quit Workbench             Figure 1 12  Selecting a license file      Click the Choose License File button and browse to find the license file provided by CLC bio   When you have selected the file  click Next     Accepting the license agreement    Regardless of which option you chose above  you will now see the dialog shown in figure 1 13     Please read the License agreement carefully before clicking I accept these terms and Finish     1 4 4 Upgrade license    If you already have used a previous version of CLC DNA Workbench  and you are entitled to  upgrading to the new CLC DNA Workbench 6 6  select this option to get a license upgrade     When you click Next  the workbench will search for a previous installation of CLC DNA Workbench   It will then locate the old license     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 22 
84.  author and provider of the work  You may not use this  work for commercial purposes  You may not alter  transform  nor build upon this work     SOME RIGHTS RESERVED    See http   creativecommons org licenses by nc nd 2 5  for more information on  how to use the contents     13 5 Join sequences    CLC DNA Workbench can join several nucleotide or protein sequences into one sequence  This  feature can for example be used to construct  supergenes    for phylogenetic inference by joining  several disjoint genes into one  Note  that wnen sequences are joined  all their annotations are  carried over to the new spliced sequence     Two  or more  sequences can be joined by     select sequences to join   Toolbox in the Menu Bar   General Sequence Analyses    Join sequences  258     or select sequences to join   right click any selected sequence   Toolbox   General  Sequence Analyses   Join sequences  58     This opens the dialog shown in figure 13 17     If you have selected some sequences before choosing the Toolbox action  they are now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences from  the selected elements  Click Next opens the dialog shown in figure 13 18     In step 2 you can change the order in which the sequences will be joined  Select a sequence and  use the arrows to move the selected sequence up or down     CHAPTER 13  GENERAL SEQUENCE ANALYSES 219             f  g Join Sequences  ES   1  Select sequences of same    Seet se
85.  be able to download your license as a file and import in the next step     Ifyou experience any problems  please contact The CLC Support Team      Proxy Settings   Previous    Next    Que workbench   Figure 1 2  Choosing between direct download or download web page              e Go to license download web page  The workbench will open a Web Browser with the  License Download web page when you click Next  From there you will be able to download  your license as a file and import it  This option allows you to get a license  even though the  Workbench does not have direct access to the CLC Licenses Service     If you select the first option  and it turns out that you do not have internet access from the  Workbench  because of a firewall  proxy server etc    you will be able to click Previous and use  the other option instead     Direct download    Selecting the first option takes you to the dialog shown in figure 1 5     License Wizard   83     d CLC DNA Workbench          Requesting a license       Requesting and downloading an evaluation license by establishing a direct connection to the CLC bio License  Web Service     An Evaluation License was successfully downloaded  The License is valid until  2008 07 03    If you experience any problems  please contact The CLC Support Team      Proxy Settings   Previous  next    Quit Workbench               Figure 1 3  A license has been downloaded     A progress for getting the license is shown  and when the license is downloaded  you 
86.  be just above the selected node    Set root at this node  defines the root of the tree to be at the selected node     Toggle collapse  collapses or expands the branches below the node     Change label  allows you to label or to change the existing label of a node      Change branch label  allows you to change the existing label of a branch      You can also relocate leaves and branches in a tree or change the length  It is possible to modify  the text on the unit measurement at the bottom of the tree view by right clicking the text  In this  way you can specify a unit  e g   years      Branch lengths are given in terms of expected numbers of substitutions per site     Note  To drag branches of a tree  you must first click the node one time  and then click the node  again  and this time hold the mouse button     In order to change the representation     CHAPTER 20  PHYLOGENETIC TREES 371    e Rearrange leaves and branches by  Select a leaf or branch   Move it up and down  Hint  The mouse turns into an arrow  pointing up and down     e Change the length of a branch by  Select a leaf or branch   Press Ctrl   Move left and right  Hint  The mouse turns  into an arrow pointing left and right     Alter the preferences in the Side Panel for changing the presentation of the tree     20 2 Bioinformatics explained  phylogenetics    Phylogenetics describes the taxonomical classification of organisms based on their evolutionary  history i e  their phylogeny  Phylogenetics is therefore an
87.  but the user can change to a more detailed mode in the Primer information preference    group     The number of information lines reflects the chosen length interval for primers and probes  In the  compact information mode one line is shown for every possible primer length and each of these  lines contain information regarding all possible primers of the given length  At each potential  primer starting position  a circular information point is shown which indicates whether the primer  fulfills the requirements set in the primer parameters preference group  A green circle indicates  a primer which fulfils all criteria and a red circle indicates a primer which fails to meet one or  more of the set criteria  For more detailed information  place the mouse cursor over the circle  representing the primer of interest  A tool tip will then appear on screen  displaying detailed  information about the primer in relation to the set criteria  To locate the primer on the sequence   simply left click the circle using the mouse     The various primer parameters can now be varied to explore their effect and the view area will  dynamically update to reflect this allowing for a high degree of interactivity in the primer design  process     After having explored the potential primers the user may have found a satisfactory primer and  choose to export this directly from the view area using a mouse right click on the primers  information point  This does not allow for any design information to e
88.  choose where to export to   choose GenBank   gbk  format   enter name the  new file   Save    Export of dependent elements    When exporting e g  an alignment  CLC DNA Workbench can export the alignment including all  the sequences that were used to create it  This way  when sending your alignment  with the  dependent sequences   your colleagues can reproduce your findings with adjusted parameters  if  desired  To export with dependent files     select the element in Navigation Area   File in Menu Bar   Export with Dependent  Elements   enter name of of the new file   choose where to export to   Save    The result is a folder containing the exported file with dependent elements  stored automatically  in a folder on the desired location of your desk     Export history  To export an element   s history     select the element in Navigation Area Export  ES    select History PDF  pdf     choose where to export to   Save    The entire history of the element is then exported in pdf format     The CLC format    CLC DNA Workbench keeps all bioinformatic data in the CLC format  Compared to other formats   the CLC format contains more information about the object  like its history and comments  The  CLC format is also able to hold several elements of different types  e g  an alignment  a graph and  a phylogenetic tree   This means that if you are exporting your data to another CLC Workbench   you can use the CLC format to export several elements in one file  and you will preserve all t
89.  click Set as Parameter Prototype     Note that the Workbench is validating a lot of the input and parameters when running in normal    CHAPTER 9  BATCHING AND RESULT HANDLING 136     non batch  mode  When running in batch  this validation is not performed  and this means that  some analyses will fail if combinations of input data and parameters are not right  Therefore  batching should only be used when the batch units are very homogenous in terms of the type  and size of data     9 1 4 Running the analysis and organizing the results    At the last dialog before clicking Finish  it is only possible to use the Save option  When a tool  is run in batch mode  it will place the result files in the same folder as the input files  In the  example shown in figure 9 3  the result of the two single sequences will be placed in the Cloning  folder  whereas the results for the Cloning vector library and Processed data runs  will be placed inside these folders     When the batch run is started  there will be one  master  process representing the overall batch  job  and there will then be a separate process for each batch unit  The behavior of this is different  between Workbench and Server     e When running the batch job in the Workbench  only one batch unit is run at a time  So when  the first batch unit is done  the second will be started and so on  This is done in order to  avoid many parallel analyses that would draw on the same compute resources and slow  down the computer     e Wh
90.  counting instances of each nucleotide  and then letting the majority decide the nucleotide in the contig  In case of equality   ACGT are given priority over one another in the stated order         Unknown nucleotide  N   The contig will be assigned an    N    character in all positions  with conflicts         Ambiguity nucleotides  R  Y  etc    The contig will display an ambiguity nucleotide  reflecting the different nucleotides found in the reads  For an overview of ambiguity  codes  see Appendix       Note  that conflicts will always be highlighted no matter which of the options you choose   Furthermore  each conflict will be marked as annotation on the contig sequence and will be  present if the contig sequence is extracted for further analysis  As a result  the details of any  experimental heterogeneity can be maintained and used when the result of single sequence  analyzes is interpreted  Read more about conflicts in section 17 7 4     e Create full contigs  including trace data  This will create a contig where all the aligned  reads are displayed below the contig sequence   You can always extract the contig  sequence without the reads later on   For more information on how to use the contigs that  are created  see section 17 7     e Show tabular view of contigs  A contig can be shown both in a graphical as well as a  tabular view  If you select this option  a tabular view of the contig will also be opened  Even  if you do not select this option  you can show the tabula
91.  databases are available from a dedicated BLAST ftp site ftp    ftp ncbi nlm   nih gov blast db   Moreover  it is possible to download programs scripts from the same  site enabling automatic download of changed BLAST databases  Thus it is possible to schedule  a nightly update of changed databases and have the updated BLAST database stored locally or  on a shared network drive at all times  Most BLAST databases on the NCBI site are updated on  a daily basis to include all recent sequence submissions to GenBank     A few commercial software packages are available for searching your own data  The advantage  of using a commercial program is obvious when BLAST is integrated with the existing tools of  these programs  Furthermore  they let you perform BLAST searches and retain annotations on  the query sequence  see figure 12 22   It is also much easier to batch download a selection of  hit sequences for further inspection     CHAPTER 12  BLAST SEARCH 197    Intron 2  intron 2       CGTGGATCCTGAGAACTTCAGGGTGAGTCTATGGGACGCTTGATS  CGTGGATCCTGAGAACTTCAGGGTGAGTC  TGTGGATCCTGAGAACTTCAAGGTGAGTC  TGOTGGATCCTGAGAACTTCAAGGTGAGTC  TGOTGGATCCTGAGAACTTCAAGGTGAGT  CGTGGACCCTGAGAACTTCCTGGTGAGT   Figure 12 22  Snippet of alignment view of BLAST results from CLC Main Workbench  Individual  alignments are represented directly in a graphical view  The top sequence is the query sequence  and is shown with a selection of annotations     12 5 8 What you cannot get out of BLAST    Don   t expect BLAST 
92.  e To the left  you see all the enzymes that are in the list select above  If you have not chosen  to use an existing enzyme list  this panel shows all the enzymes available        e To the right  there is a list of the enzymes that will be used     Select enzymes in the left side panel and add them to the right panel by double clicking or clicking  the Add button  E   If you e g  wish to use EcoRV and BamHI  select these two enzymes and  add them to the right side panel     If you wish to use all the enzymes in the list   Click in the panel to the left   press Ctrl   A  38   A on Mac    Add    gt      The enzymes can be sorted by clicking the column headings  i e  Name  Overhang  Methylation  or Popularity  This is particularly useful if you wish to use enzymes which produce e g  a 3     overhang  In this case  you can sort the list by clicking the Overhang column heading  and all the  enzymes producing 3    overhangs will be listed together for easy selection     When looking for a specific enzyme  it is easier to use the Filter  If you wish to find e g  Hindlll  sites  simply type Hindlll into the filter  and the list of enzymes will shrink automatically to only  include the Hindlll enzyme  This can also be used to only show enzymes producing e g  a 3     overhang as shown in figure 18 51     If you need more detailed information and filtering of the enzymes  either place your mouse  cursor on an enzyme for one second to display additional information  see figure 18 52   o
93.  eas   data overview    Secondary peak calling   Multiplexing based on barcode or name    376    DNA RNA  E E  E E  E E   DNA RNA  E  E  E  E  z  E  E  E     E    Main    Main  E    Genomics    Genomics  E    APPENDIX A  COMPARISON OF WORKBENCHES AND THE VIEWER    Next generation Sequencing Data Analysis Viewer Protein DNA RNA  Import of 454  Illumina Genome Analyzer   SOLID and Helicos data   Reference assembly of human size genomes  De novo assembly   SNP DIP detection   Graphical display of large contigs   Support for mixed data assembly   Paired data support   RNA Seq analysis   Expression profiling by tags   ChIP Seq analysis    Expression Analysis Viewer Protein DNA RNA  Import of Illumina BeadChip  Affymetrix  GEO  data    Import of Gene Ontology annotation files  Import of Custom expression data table and  Custom annotation files   Multigroup comparisons   Advanced plots  scatter plot  volcano plot   box plot and MA plot   Hierarchical clustering   Statistical analysis on count based and gaus   Sian data   Annotation tests   Principal component analysis  PCA   Hierarchical clustering and heat maps  Analysis of RNA Seq Tag profiling samples    Molecular cloning Viewer Protein DNA RNA  Advanced molecular cloning E  Graphical display of in silico cloning E  Advanced sequence manipulation 2  Database searches Viewer Protein DNA RNA  GenBank Entrez searches E E        UniProt searches  Swiss Prot TrEMBL   Web based sequence search using BLAST  BLAST on local database   Cre
94.  existing enzyme list Popular enzymes v   EB          Enzymes in  Popular en     Enzymes to be used  Filter  Filter                          Name Overhang Methylat    Popul      Name Overhang Methyla    Pop       PstI     tgca S   N6 met    te  KpnI     gtac 5  N   met          Sacl     agct 5   S meth          SphI     catg ie  Apal     gace 5  5 meth      Ball    nnn 5   N4 met     k  Chal    gate       Fokl     lt NA gt  3  N   met      Hhal   cg 5   5 meth     NsiI     tgca   SacII    gc 5   5 meth                               Figure 18 36  Selecting enzymes     lf you need more detailed information and filtering of the enzymes  either place your mouse  cursor on an enzyme for one second to display additional information  see figure 18 52   or use  the view of enzyme lists  see 18 5                                       All enzymes  Filter  3    Name Overh    Methyl    Pop       PstI 3   N6   meth     eer la  KpnI 3   N6 meth    ee     Sacl 3  S methyl    ee    SphI  3  preso    Apal 3    S methyl     tt    SaclII 3 5 methyl          Nsil Enzyme  Sacll  Chal Recognition site pattern  CCGCGG  Ball Suppliers  GE Healthcare  Hhal Qbiogene  cml American Allied Biochemical  Inc   Dralll Nippon Gene Co  Ltd    Takara Bio Inc   BanlI       New England Biolabs       Toyobo Biochemicals  Molecular Biology Resources  Promega Corporation   EURx Ltd        Figure 18 37  Showing additional information about an enzyme like recognition sequence or a list  of commercial vendors     Clic
95.  facto  standard scoring matrix for a wide range of alignment programs  It is the default matrix in BLAST     Calculate your own PAM matrix  hile   Fem  DiGi note ice  nL Loos  pam  hem    BLOKS database  hte   blocks  2  ere  oro     NCBI help site  http   www ncbi nlm nih gov Education BLASTinfo Scoring2  html    Creative Commons License    All CLC bio   s scientific articles are licensed under a Creative Commons Attribution NonCommercial   NoDerivs 2 5 License  You are free to copy  distribute  display  and use the work for educational  purposes  under the following conditions  You must attribute the work in its original form and   CLC bio  has to be clearly labeled as author and provider of the work  You may not use this  work for commercial purposes  You may not alter  transform  nor build upon this work     SOME RIGHTS RESERVED    See http   creativecommons org licenses by nc nd 2 5  for more information on  how to use the contents     13 3 Local complexity plot    In CLC DNA Workbench it is possible to calculate local complexity for both DNA and protein  sequences  The local complexity is a measure of the diversity in the composition of amino acids  within a given range  window  of the sequence  The K2 algorithm is used for calculating local  complexity  Wootton and Federhen  1993   To conduct a complexity calculation do the following     Select sequences in Navigation Area   Toolbox in Menu Bar   General Sequence  Analyses  GA    Create Complexity Plot   l   This open
96.  formatted BLAST databases available                186  12 3 2 Download NCBI pre formatted BLAST databases               186  12 3 3 Create local BLAST databases          2 0 0  eee eee ees 187  12 4 Manage BLAST databases         0 08 eee eee eee ee 188  12 4 1 Migrating from a previous version of the Workbench              189  12 5 Bioinformatics explained  BLAST         0 0088 ee ee een nena 189  12 5 1 Examples of BLAST Usage  css we Ee wm Se we we 190  12 5 2 Searching for homology   wc ee be a hw ee ae ee eee ee a A 190  12 5 3 How does BLAST work           0  0 ee ee ee ee ee 190  12 5 4 Which BLAST program should  use                0 0208  192  12 5 5 Which BLAST options should   change                     193  12 5 6 Explanation of the BLAST output 2 44 2 044 oe ed ee ew ead eo ws 194  12 5 7 I want to BLAST against my own sequence database  is this possible    196  12 5 8 What you cannot get out of BLAST                   4   197  12 5 9 Other useful resources         0 0 0 eee eee eee we ee ee 197    CLC DNA Workbench offers to conduct BLAST searches on protein and DNA sequences  In short   a BLAST search identifies homologous sequences between your input  query  query Sequence and  a database of sequences  McGinnis and Madden  2004   BLAST  Basic Local Alignment Search    173    CHAPTER 12  BLAST SEARCH 174    Tool   identifies homologous sequences using a heuristic method which finds short matches  between two sequences  After initial match BLAST attempts to s
97.  have selected to export the whole view   if you have chosen  to export the visible area only  the graphics file will be on one page with no headers or footers     7 3 4 Exporting protein reports    It is possible to export a protein report using the normal Export function E  which will generate  a pdf file with a table of contents     Click the report in the Navigation Area   Export  ES  in the Toolbar   select pdf    You can also choose to export a protein report using the Export graphics function      this way you will not get the table of contents           but in    1 4 Export graph data points to a file    Data points for graphs displayed along the sequence or along an alignment  mapping or BLAST  result  can be exported to a semicolon separated text file  csv format   An example of such  a graph is shown in figure 7 14  This graph shows the coverage of reads of a read mapping   produced with CLC Genomics Workbench      To export the data points for the graph  right click the graph and choose Export Graph to  Comma separated File  Depending on what kind of graph you have selected  different options    CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 129    SSS SS eT  NC_000003 iACCATTCGATGATTGCATTCAATTCATTCGATGACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC    Consensus iACCATTCGATGATTGCATTCAATTCATTCGATGACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC  3388    Coverage        awe A GA LU A ONAT aes AA  8 1205  1326 1 TGACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC  1 2 413 1273 2 TGACGATTCCA
98.  in the Menu Bar   Paste   71     If there is already an element of that name  the pasted element will be renamed by appending a  number at the end of the name     Elements can also be moved instead of copied  This is done with the cut paste function     select the files to cut   right click one of the selected files   Cut        right click  the location to insert files into   Paste  C gt      or select the files to cut   Ctrl   X  38   X on Mac    select where to insert files   Ctrl    V  3   V on Mac     When you have cut the element  it is  greyed out  until you activate the paste function  If you  change your mind  you can revert the cut command by copying another element     Note that if you move data between locations  the original data is kept  This means that you are  essentially doing a copy instead of a move operation     Move using drag and drop  Using drag and drop in the Navigation Area  as well as in general  is a four step process     click the element   click on the element again  and hold left mouse button   drag  the element to the desired location   let go of mouse button    This allows you to     e Move elements between different folders in the Navigation Area    e Drag from the Navigation Area to the View Area  A new view is opened in an existing View  Area if the element is dragged from the Navigation Area and dropped next to the tab s  in  that View Area     e Drag from the View Area to the Navigation Area  The element  e g  a sequence  alignment   sea
99.  in the Vector NTI Local Database which can be accessed through Vector    NTI Explorer  This is described in the first section below     e Your data is stored as single files on your computer  just like Word documents etc    This  is described in the second section below     Import from the Vector NTI Local Database    If your Vector NTI data are stored in a Vector NTI Local Database  as the one shown in figure 7 2    you can import all the data in one step  or you can import selected parts of it     Importing the entire database in one step    From the Workbench  there is a direct import of the whole database  see figure   3      CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 120             A   al Exploring   Local Vector NTI Database oO         ss    Table Edit View Analyses Align Database Assemble Tools Help   Tl  DNA RNA Molecules    X   amp   iBAalta SIE  All Subsets All database DNA RNA Molecules   E  DNA RNA Molecules  MAIN      alll Invitrogen vectors  xz  ADCY7 6196 Linear Basic NCBI Entrez NCBI   uc  Adeno2 35937 Linear Basic NCBI Entrez NCBI   x  ADRA1A 2306 Linear Basic NCBI Entrez NCBI       BaculoDirect Linear DNA 139370 Linear Basic Invitrogen Invitr       BaculoDirect Linear DNA Clonin    5770 Linear Construc    Invitrogen Invitr   33  BPV1 7945 Circular Basic NCBI Entrez NCBI   a  BRAF 2510 Linear Basic NCBI Entrez NCBI   se  CDK2 2226 Linear Basic NCBI Entrez NCBI   3  ColE1 6646 Circular Basic NCBI Entrez NCBI   uc  CREB1 2964 Linear Basic NCBI Entrez NCBI 
100.  information about the individual bands by hovering the mouse cursor on the band of  interest  This will display a tool tip with the following information    e Fragment length   e Fragment region on the original sequence    e Enzymes cutting at the left and right ends  respectively    CHAPTER 18  CLONING AND CUTTING 343    For gels comparing whole sequences  you will see the sequence name and the length of the  sequence     Note  You have to be in Selection       or Pan       mode in order to get this information   It can be useful to add markers to the gel which enables you to compare the sizes of the bands   This is done by clicking Show marker ladder in the Side Panel     Markers can be entered into the text field  separated by commas     Modifying the layout    The background of the lane and the colors of the bands can be changed in the Side Panel  Click  the colored box to display a dialog for picking a color  The slider Scale band spread can be used  to adjust the effective time of separation on the gel  i e  how much the bands will be spread over  the lane  In a real electrophoresis experiment this property will be determined by several factors  including time of separation  voltage and gel density     You can also choose how many lanes should be displayed     e Sequences in separate lanes  This simulates that a gel is run for each sequence     e All sequences in one lane  This simulates that one gel is run for all Sequences     You can also modify the layout of the vi
101.  integral part of the science of systematics  that aims to establish the phylogeny of organisms based on their characteristics  Furthermore   phylogenetics is central to evolutionary biology as a whole as it is the condensation of the overall  paradigm of how life arose and developed on earth     20 2 1 The phylogenetic tree  The evolutionary hypothesis of a phylogeny can be graphically represented by a phylogenetic tree     Figure 20 5 shows a proposed phylogeny for the great apes  Hominidae  taken in part from  Purvis  Purvis  1995   The tree consists of a number of nodes  also termed vertices  and  branches  also termed edges   These nodes can represent either an individual  a species  or  a higher grouping and are thus broadly termed taxonomical units  In this case  the terminal  nodes  also called leaves or tips of the tree  represent extant species of Hominidae and are the  operational taxonomical units  OTUs   The internal nodes  which here represent extinct common  ancestors of the great apes  are termed hypothetical taxonomical units since they are not directly  observable     Root node Branches   edges    Terminal nodes   leaves  Most recent common ancestor    Operational Taxonomical Units    Orangutan   Human  Pygmy chimpanzee  Chimpanzee   Gorilla    Internal Node   vertice  Hypothetical Taxonomical Unit    Figure 20 5  A proposed phylogeny of the great apes  Hominidae   Different components of the  tree are marked  see text for description     The ordering of the
102.  is relative to the overall number of  sequence reads       Foreground color  Colors the letters using a gradient  where the left side color is  used for low coverage and the right side is used for maximum coverage      Background color  Colors the background of the letters using a gradient  where  the left side color is used for low coverage and the right side is used for maximum  coverage     Graph  The coverage is displayed as a graph  Learn how to export the data behind  the graph in section   4        Height  Specifies the height of the graph     Type  The graph can be displayed as Line plot  Bar plot or as a Color bar       Color box  For Line and Bar plots  the color of the plot can be set by clicking  the color box  If a Color bar is chosen  the color box is replaced by a gradient  color box as described under Foreground color     e Residue coloring  There is one additional parameter         Sequence colors  This option lets you use different colors for the reads       Main  The color of the consensus and reference sequence  Black per default   x Forward  The color of forward reads  single reads   Green per default   x Reverse  The color of reverse reads  single reads   Red per default     x Paired  The color of paired reads  Blue per default  Note that reads from  broken pairs are colored according to their Forward Reverse orientation or as a  Non specific match  but with a darker nuance than ordinary single reads       Non specific matches  When a read would have ma
103.  license server in the dialog  When you restart CLC DNA Workbench   you will be asked for a license as described in section 1 4     1 4 6 Limited mode    We have created the limited mode to prevent a situation where you are unable to access your  data because you do not have a license  When you run in limited mode  a lot of the tools in the  Workbench are not available  but you still have access to your data  also when stored in a CLC  Bioinformatics Database   When running in limited mode  the functionality is equivalent to the  CLC Sequence Viewer  see section A      To get out of the limited mode and run the Workbench normally  restart the Workbench  When  you restart the Workbench will try to find a proper license and if it does  it will start up normally   If it can   t find a license  you will again have the option of running in limited mode     1 5 About CLC Workbenches    In November 2005 CLC bio released two Workbenches  CLC Free Workbench and CLC Protein  Workbench  CLC Protein Workbench is developed from the free version  giving it the well tested  user friendliness and look  amp  feel  However  the CLC Protein Workbench includes a range of more  advanced analyses     In March 2006  CLC DNA Workbench  formerly CLC Gene Workbench  and CLC Main Workbench  were added to the product portfolio of CLC bio  Like CLC Protein Workbench  CLC DNA Workbench    CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 28    builds on CLC Free Workbench  It shares some of the advanced produc
104.  matter if it is Zoomed  in our out  displays minimum 10 nucleotides on each line          Fixed wrap  Makes it possible to specify when the sequence should be wrapped  In  the text field below  you can choose the number of residues to display on each line     e Double stranded  Shows both strands of a sequence  only applies to DNA sequences      e Numbers on sequences  Shows residue positions along the sequence  The starting point  can be changed by setting the number in the field below  If you set it to e g  101  the first  residue will have the position of  100  This can also be done by right clicking an annotation  and choosing Set Numbers Relative to This Annotation     e Numbers on plus strand  Whether to set the numbers relative to the positive or the negative  strand in a nucleotide sequence  only applies to DNA sequences      e Follow selection  When viewing the same sequence in two separate views   Follow  selection  will automatically scroll the view in order to follow a selection made in the other  view     e Lock numbers  When you scroll vertically  the position numbers remain visible   Only  possible when the sequence is not wrapped      e Lock labels  When you scroll horizontally  the label of the sequence remains visible   e Sequence label  Defines the label to the left of the sequence         Name  this is the default information to be shown          Accession  Sequences downloaded from databases like GenBank have an accession  number      Latin name     Lati
105.  me Bot readl      gj Bee read   Trace of reade  sc  length  S60  low quality 88  medium quality 135  high quality 337  BE reads      Xe read4  OE reads    Figure 17 1  A tooltip displaying information about the quality of the chromatogram        The qualities are based on the phred scoring system  with scores below 19 counted as low  quality  scores between 20 and 39 counted as medium quality  and those 40 and above counted  as high quality     If the trace file does not contain information about quality  only the sequence length will be  shown     To view the trace data  open the sequence read in a standard sequence view  pz      17 1 1 Scaling traces    The traces can be scaled by dragging the trace vertically as shown in figure figure 17 2  The  Workbench automatically adjust the height of the traces to be readable  but if the trace height  varies a lot  this manual scaling is very useful     The height of the area available for showing traces can be adjusted in the Side Panel as described  insection 17 1 2     T c A c G Cc    T T G c Cc A ir  eae trace data by dragging up and down Vw    Figure 17 2  Grab the traces to scale        17 1 2 Trace settings in the Side Panel    In the Nucleotide info preference group the display of trace data can be selected and unselected   When selected  the trace data information is shown as a plot beneath the sequence  The  appearance of the plot can be adjusted using the following options  see figure 17 3      e Nucleotide trace  For eac
106.  new enzyme list es    1  Please choose enzymes MissAs    Name Overhang Methylation Popularity     Name Overhang Methylation Popularity      HindIII 5   agct N6 methyl    tee a  EcoRV Blunt   N6 methyl           Smal Blunt   N4 methyl           EcoRI 5   aatt N6 methyl    40  Xbal 5   ctag N6 methyl           SmaI Blunt   N4 methyl           Sall 5    tcga N6 methyl           Sall 5    tega N6 methyl    tt  EcoRV Blunt   N6 methyl           PstI 3    tgca N6 methyl           EcoRI 5   aatt N6 methyl    tee   N6 methyl    tt  BglII 5    gatc N4 methyl          BglII 5    gatc N4 methyl    et  Xhol 5  tega N6 methyl           Xbal 5   ctag N6 methyl    7e  PstI 3    tgca N6 methyl           HindIII 5    agct N6 methyl           BamHI 5   gatc N4 methyl           BamHI 5    gatc N4 methyl    rt  KpnI 3    gtac N6 methyl         Ncol 5   catg N4 methyl    et   NotI 5   gacc N4 methyl          SacI 3   aget 5 methyle            NcoI 5    catg N4 methyl    ee KpnI 3    gtac N6 methyl    et  Sacl 3    aget 5 methylc          NotI 5    ggcc N4 methyl          NdeI S  ta N6 methyl    teto     Fa A a E ca     A E _ a L          ES j wf OK   XX Cancel      he  Es                   Figure 18 50  Choosing enzymes for the new enzyme list     At the top  you can choose to Use existing enzyme list  Clicking this option lets you select an  enzyme list which is stored in the Navigation Area  See section 18 5 for more about creating  and modifying enzyme lists     Below there are two panels    
107.  nodes determine the tree topology and describes how lineages have diverged  over the course of evolution  The branches of the tree represent the amount of evolutionary  divergence between two nodes in the tree and can be based on different measurements  A tree  is completely specified by its topology and the set of all edge lengths     The phylogenetic tree in figure 20 5 is rooted at the most recent common ancestor of all  Hominidae species  and therefore represents a hypothesis of the direction of evolution e g  that    CHAPTER 20  PHYLOGENETIC TREES 372    the common ancestor of gorilla  chimpanzee and man existed before the common ancestor  of chimpanzee and man  In contrast  an unrooted tree would represent relationships without  assumptions about ancestry     20 2 2 Modern usage of phylogenies    Besides evolutionary biology and systematics the inference of phylogenies is central to other  areas of research     As more and more genetic diversity is being revealed through the completion of multiple  genomes  an active area of research within bioinformatics is the development of comparative  machine learning algorithms that can simultaneously process data from multiple species  Siepel  and Haussler  2004   Through the comparative approach  valuable evolutionary information can  be obtained about which amino acid substitutions are functionally tolerant to the organism and  which are not  This information can be used to identify substitutions that affect protein function  
108.  number of possible trees means that bayesian phylogenetics  must be performed by approximative Monte Carlo based methods   Larget and Simon  1999     Yang and Rannala  1997      20 2 4 Interpreting phylogenies    Bootstrap values   A popular way of evaluating the reliability of an inferred phylogenetic tree is bootstrap analysis   The first step in a bootstrap analysis is to re sample the alignment columns with replacement   l e   In the re sampled alignment  a given column in the original alignment may occur two or more  times  while some columns may not be represented in the new alignment at all  The re sampled  alignment represents an estimate of how a different set of sequences from the same genes and  the same species may have evolved on the same tree     If a new tree reconstruction on the re sampled alignment results in a tree similar to the original  one  this increases the confidence in the original tree  If  on the other hand  the new tree looks  very different  it means that the inferred tree is unreliable  By re sampling a number of times  it is possibly to put reliability weights on each internal branch of the inferred tree  If the data  was bootstrapped a 100 times  a bootstrap score of 100 means that the corresponding branch  occurs in all 100 trees made from re sampled alignments  Thus  a high bootstrap score is a sign  of greater reliability     Other useful resources    The Tree of Life web project  http   tolweb org    Joseph Felsensteins list of phylogeny
109.  of   Serpa ameters  type    same       2  Set parameters    Set order of concatenation   top First       IEE alignment 2 aj W    IEE alignment 1 adn                          q         Previous     gt  Next    Enh    XX cone o                Figure 19 11  Selecting order of concatenation     To adjust the order of concatenation  click the name of one of the alignments  and move it up or  down using the arrow buttons     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   The result is seen in figure 19 12     CHAPTER 19  SEQUENCE ALIGNMENT 361    4  100 200       sequence A from alignment 1  110  sequence B from alignment 1 110  sequence A from alignment 2           7       sequence B from alignment 2 e 111  v  Es a    Figure 19 12  The joining of the alignments result in one alignment containing rows of sequences  corresponding to the number of uniquely named sequences in the joined alignments     19 4 1 How alignments are joined    Alignments are joined by considering the sequence names in the individual alignments  If two  sequences from different alignments have identical names  they are considered to have the  same origin and are thus joined  Consider the joining of alignments A and B  If a sequence  named  in A and B  is found in both A and B  the spliced alignment will contain a sequence  named  in A and B  which represents the characters from A and B joined in direct extension of  each other  If a sequence with the name
110.  of the residues using a gradient in the  same way as described above        Graph  Displays the conservation level as a graph at the bottom of the alignment   The bar  default view  show the conservation of all sequence positions  The height of  the graph reflects how conserved that particular position is in the alignment  If one  position is 100  conserved the graph will be shown in full height  Learn how to export  the data behind the graph in section 7 4    x Height  Specifies the height of the graph   x Type  The type of the graph     Line plot  Displays the graph as a line plot     Bar plot  Displays the graph as a bar plot     Colors  Displays the graph as a color bar using a gradient like the foreground  and background colors   x Color box  Specifies the color of the graph for line and bar plots  and specifies a  gradient for colors     e Gap fraction  Which fraction of the sequences in the alignment that have gaps  The gap  fraction is only relevant if there are gaps in the alignment         Foreground color  Colors the letter using a gradient  where the left side color is used  if there are relatively few gaps  and the right side color is used if there are relatively  many gaps        Background color  Sets a background color of the residues using a gradient in the  same way as described above        Graph  Displays the gap fraction as a graph at the bottom of the alignment  Learn how  to export the data behind the graph in section   4     x Height  Specifies the he
111.  on how the data was  created  Also  if you have performed an analysis and you want to reproduce the analysis on  another element  you can check the history of the analysis which will give you all parameters you  set     This chapter will describe how to use the History functionality of CLC DNA Workbench     8 1 Element history    You can view the history of all elements in the Navigation Area except files that are opened in  other programs  e g  Word and pdf files   The history starts when the element appears for the  first time in CLC DNA Workbench  To view the history of an element     Select the element in the Navigation Area   Show  42   in the Toolbar   History  LR   or If the element is already open   History  LR  at the bottom left part of the view  This opens a view that looks like the one in figure 8 1   When opening an element s history is opened  the newest change is submitted in the top of the  view  The following information is available   e Title  The action that the user performed     e Date and time  Date and time for the operation  The date and time are displayed according    131    CHAPTER 8  HISTORY LOG 132    Ch Reference contig     NI ASI LIL cL   Moved aligned region  Wed Jan 21 10 40 45 CET 2009   User  smoensted  Parameters   Read name   Fuda  Old aliqned region   159  966    New aligned region   37   900  Comments  Edik       gt                                Wo Comment   Deleted selection  Wed Jan 21 10 39 57 CET 2009   User  smoensted   Parameters 
112.  only the  search parameters  This means that you can easily conduct the same search later on when your  data has changed     4 4 Search index    This section has a technical focus and is not relevant if your search works fine     However  if you experience problems with your search results  if you do not get the hits you  expect  it might be because of an index error     The CLC DNA Workbench automatically maintains an index of all data in all locations in the  Navigation Area  If this index becomes out of sync with the data  you will experience problems  with strange results  In this case  you can rebuild the index     Right click the relevant location   Location   Rebuild Index    This will take a while depending on the size of your data  At any time  the process can be stopped  in the process area  see section 3 4 1     Chapter 5    User preferences and settings    Contents  5 1 General preferences         080 ee eee ee 104  5 2 Default view preferences       0  00 ee eee eee 106  5 2 1 Number formatting in tables            5 2  ee ee ee eee 107  5 2 2 Import and export Side Panel settings                     107  5 3 Datapreferences        000 ee eee ee 108  5 4 Advanced preferences       2  20 eee eee eee ee ee ee 109  5 4 1 Default data location            2    eee ee es 109  Oe  NOPIBLASI area bw tee ede eee eed ee Seed wee    109  5 5 Export import of preferences        0 08 ee eee ee ee 109  5 5 1 The different options for export and importing                 109  
113.  pA               Select restriction sites in sequence view to define target vector and Fragments for cloning       400 Current Sequence as Fragment K Define target vector Perform Cloning ES   HindIII  3  E            Multiple cutters    X Define Fragments to insert    Bs Feapv ia  mm       BABE OLY    Figure 18 3  Cloning editor     There are essentially three ways of performing cloning in the CLC DNA Workbench  The first is  the most straight forward approach which is based on a simple model of selecting restriction  sites for cutting out one or more fragments and defining how to open the vector to insert the  fragments  This is described as the cloning work flow below  The second approach is unguided  and more flexible and allows you to manually cut  copy  insert and replace parts of the sequences   This approach is described under manual cloning below  Finally  the CLC DNA Workbench also  supports Gateway cloning  see section 18 2      18 1 2 The cloning work flow  The cloning work flow is designed to support restriction cloning work flows through the following    steps     1  Define one or more fragments  2  Define how the vector should be opened    3  Specify orientation and order of the fragment    Defining fragments    First  select the sequence containing the cloning fragment in the list at the top of the view  Next   make sure the restriction enzyme you wish to use is listed in the Side Panel  see section 18 3 1    To specify which part of the sequence should be tre
114.  pages  If you set the value to e g  2  the printed content will be broken up  vertically and split across 2 pages     Note  It is a good idea to consider adjusting view settings  e g  Wrap for sequences   in the  Side Panel before printing  As explained in the beginning of this chapter  the printed material will  look like the view on the screen  and therefore these settings should also be considered when  adjusting Page Setup     CHAPTER 6  PRINTING 116  12  34    5  6    Figure 6 6  An example where Fit to pages horizontally is set to 2  and Fit to pages vertically is set  to 3     6 2 1 Header and footer    Click the Header Footer tab to edit the header and footer text  By clicking in the text field  for either Custom header text or Custom footer text you can access the auto formats for  header footer text in Insert a caret position  Click either Date  View name  or User name to  include the auto format in the header footer text     Click OK when you have adjusted the Page Setup  The settings are saved so that you do not  have to adjust them again next time you print  You can also change the Page Setup from the File  menu     6 3 Print preview    The preview is shown in figure 6 7           Preview   CLC Main Workbench 4 0 Es    S UW w tw W Zoom 100                             Figure 6 7  Print preview     The Print preview window lets you see the layout of the pages that are printed  Use the arrows  in the toolbar to navigate between the pages  Click Print   lt 5  to sho
115.  pairwise distances  are the UPGMA and the Neighbor  Joining algorithms  Thus  the first step in these analyses is to compute a matrix of pairwise  distances between OTUs from their sequence differences  To correct for multiple substitutions it  is common to use distances corrected by a model of molecular evolution such as the Jukes Cantor  model  Jukes and Cantor  1969      UPGMA  A simple but popular clustering algorithm for distance data is Unweighted Pair Group  Method using Arithmetic averages  UPGMA     Michener and Sokal  1957    Sneath and Sokal   19 3    This method works by initially having all sequences in separate clusters and continuously  joining these  The tree is constructed by considering all initial clusters as leaf nodes in the tree   and each time two clusters are joined  a node is added to the tree as the parent of the two  chosen nodes  The clusters to be joined are chosen as those with minimal pairwise distance   The branch lengths are set corresponding to the distance between clusters  which is calculated    CHAPTER 20  PHYLOGENETIC TREES 373    as the average distance between pairs of sequences in each cluster     The algorithm assumes that the distance data has the so called molecular clock property i e  the  divergence of sequences occur at the same constant rate at all parts of the tree  This means  that the leaves of UPGMA trees all line up at the extant sequences and that a root is estimated  as part of the procedure     Arabidopsis thaliana  Ara
116.  pertaining to oligo pairs such as e g  the oligo pair annealing  score  The ideal score for a solution is 100 and solutions are thus ranked in descending  order  Each parameter is assigned an ideal value and a tolerance  Consider for example oligo  self annealing  here the ideal value of the annealing score is O and the tolerance corresponds to  the maximum value specified in the side panel  The contribution to the final score is determined  by how much the parameter deviates from the ideal value and is scaled by the specified tolerance   Hence  a large deviation from the ideal and a small tolerance will give a large deduction in the  final score and a small deviation from the ideal and a high tolerance will give a small deduction  in the final score     16 2 Setting parameters for primers and probes    The primer specific view options and settings are found in the Primer parameters preference  group in the Side Panel to the right of the view  see figure 16 3           Primer parameters    Primer information  Length    no Show  Min     Compact    Melk  temp  20    Detailed       Advanced parameters  Mode      Standard PCR         TaqMan  O  Nested PCR        Sequencing  Calculate    Figure 16 3  The two groups of primer parameters  in the program  the Primer information group is  listed below the other group      CHAPTER 16  PRIMERS 252    16 2 1 Primer Parameters    In this preference group a number of criteria can be set  which the selected primers must meet   All the crit
117.  primer which fulfils all criteria and a red primer indicates a primer which fails to meet  one or more of the set criteria  For more detailed information  place the mouse cursor over the  circle representing the primer of interest  A tool tip will then appear on screen displaying detailed  information about the primer in relation to the set criteria  To locate the primer on the sequence   simply left click the circle using the mouse     The various primer parameters can now be varied to explore their effect and the view area will  dynamically update to reflect this  If e g  the allowed melting temperature interval is widened  more green circles will appear indicating that more primers now fulfill the set requirements and  if e g  a requirement for 3    G C content is selected  rec circles will appear at the starting points  of the primers which fail to meet this requirement     16 3 2 Detailed information mode    In this mode a very detailed account is given of the properties of all the available primers  When  a region is chosen primer information will appear in groups of lines beneath it  see figure 16 5      CHAPTER 16  PRIMERS 255    TT PERHSEC               E TREE     e Primer information A  FERH3BEC GTGAGTCTGATGGGTETGE i  7  Show  Tm L 18     Compact      Detailed    Tm L 19    GC contentiGich  Melting temp   Tr        Self annealingisa     Tm L 20       Self end annealing SE      Tm L 21    Secondary structured      3 end ofc    Tm L 22   Ss end ofc       a      lt   
118.  primers as a criteria in the design  process  see above   The central part of the dialog contains parameters pertaining to primer  pairs  Here three parameters can be set     CHAPTER 16  PRIMERS 259    e Maximum percentage point difference in G C content   if this is set at e g  5 points a pair  of primers with 45  and 49  G C nucleotides  respectively  will be allowed  whereas a pair  of primers with 45  and 51  G C nucleotides  respectively will not be included     e Maximal difference in melting temperature of primers in a pair   the number of degrees  Celsius that primers in a pair are all allowed to differ     e Max hydrogen bonds between pairs   the maximum number of hydrogen bonds allowed  between the forward and the reverse primer in a primer pair     e Max hydrogen bonds between pair ends   the maximum number of hydrogen bonds allowed  in the consecutive ends of the forward and the reverse primer in a primer pair     e Maximum length of amplicon   determines the maximum length of the PCR fragment     16 5 2 Standard PCR output table    If only a single region is selected the following columns of information are available     e Sequence   the primer   s sequence     e Score   measures how much the properties of the primer  or primer pair  deviates from the  optimal solution in terms of the chosen parameters and tolerances  The higher the score   the better the solution  The scale is from O to 100     e Region   the interval of the template sequence covered by the pri
119.  r mer Tapie Secring  en eT    Rows  100 Standard primers For  pcDNA3 atp8al primers    Filter  All                  Show column  Score     Pair annealing align  Fwd Rev  Fragment length    Sequence Fwd Melt  temp  Fwd Sequence Rev Melt  temp  Rev 71s  core  GGTGGGAGGTCTATATAA a Pair annealing  Fwd  Rev  l  62 56   II tl 598 00 GGTGGGAGGTCTATATAA 48 572 GGAACTGAGAATAGAGGAA 49 094     Pair annealing align  Fwd  Rev     AAGGAGATAAGAGTCAAGG  Pair end annealing  Fwd  Rev     GGTGGGAGGTCTATATAA   57 873   RL 598 00 GGTGGGAGGTCTATATAA 48 572 GGAACTGAGAATAGAGGA 49 566   AGGAGATAAGAGTCAAGG V  Sequence Fwd  Region Fwd        Fragment length  Fwd Rev     GCGTGGATAGCGGTTTGA  55 921 pee 1 660 00 GCGTGGATAGCGGTTTGA 56 978 GAGGCTGGTTGATGAAGA 56 439 Self annealing Fwd  AGAAGTAGTTGGTCGGAG    Oy       _ E E a 1H Self annealing alignment Fwd v    Figure 16 6  Proposed primers    In the preference panel of the table  it is possible to customize which columns are shown in the  table  See the sections below on the different reaction types for a description of the available  information     The columns in the output table can be sorted by the present information  For example the user  can choose to sort the available primers by their score  default  or by their self annealing score   simply by right clicking the column header     The output table interacts with the accompanying primer editor such that when a proposed  combination of primers and probes is selected in the table the primers and probe
120.  regions can be defined     It is required that the Forward primer region  is located upstream of the Forward inner primer  region  that the Forward inner primer region  is located upstream of the Reverse inner primer  region  and that the Reverse inner primer region  is located upstream of the Reverse primer  region     In Nested PCR mode the Inner melting temperature menu in the Primer parameters panel is  activated  allowing the user to set a separate melting temperature interval for the inner and outer  primer pairs     After exploring the available primers  see section 16 3  and setting the desired parameter values  in the Primer parameters preference group  the Calculate button will activate the primer design  algorithm     After pressing the Calculate button a dialog will appear  see figure 16 9      The top and bottom parts of this dialog are identical to the Standard PCR dialog for designing  primer pairs described above     The central part of the dialog contains parameters pertaining to primer pairs and the comparison  between the outer and the inner pair  Here five options can be set     e Maximum percentage point difference in G C content  described above under Standard  PCR    this criteria is applied to both primer pairs independently     CHAPTER 16  PRIMERS 261    q Calculation parameters    Chosen parameters  Maximum primer length  Minimum primer length  Maximum G C content  Minimum GIC content  Maximum melting temperature  Minimum melting temperature  Maximu
121.  restriction enzyme analysis and functionalities for managing lists of  restriction enzymes     First  after a brief introduction  restriction cloning and general vector design is explained  Next   we describe how to do Gateway Cloning t  Finally  the general restriction site analyses are  described        Gateway is a registered trademark of Invitrogen Corporation    307    CHAPTER 18  CLONING AND CUTTING 308    18 1 Molecular cloning    Molecular cloning is a very important tool in the quest to understand gene function and regulation   Through molecular cloning it is possible to study individual genes in a controlled environment   Using molecular cloning it is possible to build complete libraries of fragments of DNA inserted  into appropriate cloning vectors     The in silico cloning process in CLC DNA Workbench begins with the selection of sequences to be  used     Toolbox   Cloning and Restriction Sites   5     Cloning  G     This will open a dialog where you can select the sequences containing the fragments you want to  clone  figure 18 1      1  Select fragments to done    Select rac    Navigation Area Selected Elements  1     2 Fragment  ATP amp a1       ATP8al genomic sequence  ATP8a1 mRNA  Cloning    H   Cloning vector library   HJ Enzyme lists   WE poDNAS atp8al   XE pcDNA4 TO  EX Primers  0G ATPBal fwd  DOC ATPBa 1 rev   HJ Processed data  Primers   Protein analyses  Protein orthologs  RNA secondary structure    Candiancina dots    d   nT                           
122.  searching  The size of the  Subset created in the CLC software depends both on the number and size of the sequences     To start a BLAST job to search your sequences against databases held at the NCBI   Toolbox   BLAST  5    NCBI BLAST  i     Alternatively  use the keyboard shortcut  Ctrl Shift B for Windows and 3  Shift B on Mac OS     This opens the dialog seen in figure 12 2    e BLAST at NCBI       1  Select sequences of same  _ Select sequences oF S  type Navigation Area Selected Elements  1     EB CLC Data   As ATP8al    E3 Example Data  Ne     gt   5  Protein orthologs  x ATP8al MRNA   gt   5  Protein analyses   gt  tq Cloning   X lt  ATP8al genomic sequence   gt   5  Sequencing data     5 Primers  o fq RNA secondary structure    D                ma i  gt  Next X Cancel    Figure 12 2  Choose one or more sequences to conduct a BLAST search with     Select one or more sequences of the same type  either DNA or protein  and click Next     In this dialog  you choose which type of BLAST search to conduct  and which database to search  against  See figure 12 3  The databases at the NCBI listed in the dropdown box will correspond  to the query sequence type you have  DNA or protein  and the type of blast search you have  chosen to run  A complete list of these databases can be found in Appendix D  Here you can also  read how to add additional databases available the NCBI to the list provided in the dropdown  menu        e BLAST at NCBI    1  Select sequences of same  _Set pa
123.  second challenge is to find the optimal alignment given a scoring function  For pairs of  sequences this can be done by dynamic programming algorithms  but for more than three  sequences this approach demands too much computer time and memory to be feasible     A commonly used approach is therefore to do progressive alignment  Feng and Doolittle  1987   where multiple alignments are built through the successive construction of pairwise alignments   These algorithms provide a good compromise between time spent and the quality of the resulting  alignment    Presently  the most exciting development in multiple alignment methodology is the construction  of statistical alignment algorithms  Hein  2001    Hein et al   2000   These algorithms employ a  scoring function which incorporates the underlying phylogeny and use an explicit stochastic model  of molecular evolution which makes it possible to compare different solutions in a statistically  rigorous way  The optimization step  however  still relies on dynamic programming and practical  use of these algorithms thus awaits further developments     Creative Commons License    All CLC bio   s scientific articles are licensed under a Creative Commons Attribution NonCommercial   NoDerivs 2 5 License  You are free to copy  distribute  display  and use the work for educational  purposes  under the following conditions  You must attribute the work in its original form and   CLC bio  has to be clearly labeled as author and provider of t
124.  section 9 2   If not  click Finish     This will open a new view in the View Area displaying the reverse complement of the selected  sequence  The new sequence is not saved automatically  To save the sequence  drag it into the  Navigation Area or press Ctrl   S      S on Mac  to activate a save dialog     CHAPTER 14  NUCLEOTIDE ANALYSES 232    14 4 Reverse sequence    CLC DNA Workbench is able to create the reverse of a nucleotide sequence  By doing that  a  new sequence is created which also has all the annotations reversed since they now occupy the  opposite strand of their previous location     Note  This is not the same as a reverse complement  If you wish to create the reverse  complement  please refer to section 14 3     select a sequence in the Navigation Area   Toolbox in the Menu Bar   Nucleotide  Analyses    A    Reverse Sequence  x     This opens the dialog displayed in figure 14 4        a  q Reverse Sequence Es         1  Select either protein or  EEM SS usa dos tis     0 e e 8a  nucleotide sequences Projects  Selected Elements   1         CLC Data xx ATP8al mRNA   gt  Example Data  Xc ATP8al genomic  AM  fhs ATP8al     Cloning     Primers     Protein analyses     Protein orthologs      RNA secondary s                  55  Sequencing data    gt     Rm  r  Q nter search term gt          seos  oe  La  RE              Figure 14 4  Reversing a sequence     If a sequence was selected before choosing the Toolbox action  the sequence is now listed in  the Selected Elem
125.  selected         Treat ambiguous characters as wildcards in sequence  If you search for e g  ATG  you  will find both ATG and ATN  If you have large regions of Ns  this option should not be  selected     Note that if you enter a position instead of a sequence  it will automatically switch to  position search     e Annotation search  Searches the annotations on the sequence  The search is performed  both on the labels of the annotations  but also on the text appearing in the tooltip that you  see when you keep the mouse cursor fixed  If the search term is found  the part of the  sequence corresponding to the matching annotation is selected  Below this option you can  choose to search for translations as well  Sequences annotated with coding regions often  have the translation specified which can lead to undesired results     e Position search  Finds a specific position on the sequence  In order to find an interval  e g   from position 500 to 570  enter  500  570  in the search field  This will make a selection  from position 500 to 570  both included   Notice the two periods      between the start an  end number  see section 10 3 2   If you enter positions including thousands separators like  123 345  the comma will just be ignored and it would be equivalent to entering 123345     CHAPTER 10  VIEWING AND EDITING SEQUENCES 148    e Include negative strand  When searching the sequence for nucleotides or amino acids  you  can search on both strands     e Name search  Searches fo
126.  sequence  they will be  Shown in the contig view as well  This can be very convenient e g  for Primer design  TE      If you wish to BLAST the consensus sequence  simply select the whole contig for your BLAST  search  It will automatically extract the consensus sequence and perform the BLAST search     In order to preserve the history of the changes you have made to the contig  the contig itself  Should be saved from the contig view  using either the save button     or by dragging it to the  Navigation Area     17 7 6 Extract parts of a contig    Sometimes it is useful to extract part of a contig for in depth analysis  This could be the case if  you have performed an assembly of several genes and you want to look at a particular gene or  region in isolation     This is possible through the right click menu of the reference or consensus sequence     Select on the reference or consensus sequence the part of the contig to extract    Right click   Extract from Selection    This will present the dialog shown in figure 17 23     The purpose of this dialog is to let you specify what kind of reads you want to include  Per default  all reads are included  The options are     Paired status Include intact paired reads When paired reads are placed within the paired dis   tance specified  they will fall into this category  Per default  these reads are colored in  blue     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 302    g Open New Contig from Selection    1 5 gl E ct rea d 5 k O 
127.  show a print dialog  See figure 6 1      In this dialog  you can     e Select which part of the view you want to print   e Adjust Page Setup     e See a print Preview window     These three options are described in the three following sections     113    CHAPTER 6  PRINTING 114       a  q Print Graphics zs      Page Setup Parameters    Orientation  Portrait  Paper Size  A4  Horizontal Pagecount  Not Applicable  Vertical Pagecount  Not Applicable  Header Text   Footer Text   Show Pagenumber  Yes    Output Options     Print visible area    Print whole view         X Cancel       Help     23 Preview     ED Page Setup             Figure 6 1  The Print dialog     6 1 Selecting which part of the view to print    In the print dialog you can choose to     e Print visible area  or    e Print whole view    These options are available for all views that can be zoomed in and out  In figure 6 2 is a view of  a circular sequence which is zoomed in so that you can only see a part of it                 pcDNA3 atp8a1  9118 bp       e DE Ery  amp  HY    Figure 6 2  A circular sequence as it looks on the screen     When selecting Print visible area  your print will reflect the part of the sequence that is visible in  the view  The result from printing the view from figure 6 2 and choosing Print visible area can be  seen in figure 6 3      gt  MV promoter  T7 Promoter  tp8a1       pcDNA3 atp8a1  9118 bp    Figure 6 3  A print of the sequence selecting Print visible area     On the other hand  i
128.  software  http   evolution genetics washington edu phylip software html    Creative Commons License    All CLC bio   s scientific articles are licensed under a Creative Commons Attribution NonCommercial   NoDerivs 2 5 License  You are free to copy  distribute  display  and use the work for educational  purposes  under the following conditions  You must attribute the work in its original form and   CLC bio  has to be clearly labeled as author and provider of the work  You may not use this  work for commercial purposes  You may not alter  transform  nor build upon this work     SOME RIGHTS RESERVED    See http   creativecommons org licenses by nc nd 2 5  for more information on  how to use the contents     Part IV    Appendix    Appendix A    Comparison of workbenches and the    viewer    Below we list a number of functionalities that differ between CLC Workbenches and the CLC    Sequence Viewer     e CLC Sequence Viewer  m   e CLC Protein Workbench  m   e CLC DNA Workbench  m    e CLC RNA Workbench  m    e CLC Main Workbench  m     e CLC Genomics Workbench  m     Data handling Viewer Protein  Add multiple locations to Navigation Area E  Share data on network drive E  Search all your data E  Assembly of sequencing data Viewer Protein    Advanced contig assembly   Importing and viewing trace data   Trim sequences   Assemble without use of reference sequence  Map to reference sequence   Assemble to existing contig   Viewing and edit contigs   Tabular view of an assembled contig 
129.  ta Extra    Nucleotide  ES Assembly  H Cloning  3  More data  l  Primer design   F Restriction analysis    Sequences      Protein    3D structures       Address C  Documents and Settingsiclcuser CLC Data Y Go    Folders x      B CLC Data a     5  Example data  CD Extra     O Nucleotide  O Assembly  O Cloning  O More data   CD Primer design  O Restriction analysis   f B S HJ More data   ao Protein    c Sequences  O 3D structures     README    O  More data ER Recycle bin  0     O Sequences     ca   HHH    E    be  Ee              Figure 3 3  In this example the location called    CLC_Data    points to the folder at C  Documents and  settings  clcuser CLC_Data     Adding locations    Per default  there is one location in the Navigation Area called CLC_Data  It points to the  following folder     e On Windows  C  Documents and settings  lt username gt  CLC_Data  e On Mac   CLC Data    e On Linux   homefolder CLC Data    You can easily add more locations to the Navigation Area   File   New   Location  1 73     This will bring up a dialog where you can navigate to the folder you wish to use as your new  location  see figure 3 4      When you click Open  the new location is added to the Navigation Area as shown in figure 3 5     The name of the new location will be the name of the folder selected for the location  To see  where the folder is located on your computer  place your mouse cursor on the location icon  E   for second  This will show the path to the location     Sharing da
130.  the N terminal  amino acid  thus overall protein stability  Bachmair et al   1986  Gonda et al   1989  Tobias  et al   1991   The importance of the N terminal residues is generally known as the    N end rule      The N end rule and consequently the N terminal amino acid  simply determines the half life of  proteins  The estimated half life of proteins have been investigated in mammals  yeast and E  coli   see Table 13 2   If leucine is found N terminally in mammalian proteins the estimated half life is  5 5 hours     Extinction coefficient    This measure indicates how much light is absorbed by a protein at a particular wavelength  The  extinction coefficient is measured by UV spectrophotometry  but can also be calculated  The  amino acid composition is important when calculating the extinction coefficient  The extinction  coefficient is calculated from the absorbance of cysteine  tyrosine and tryptophan using the  following equation     Ext Protein    count Cystine    Eat Cystine  count Tyr  Eat Tyr  count Trp    Eat Trp     CHAPTER 13  GENERAL SEQUENCE ANALYSES 217    where Ext is the extinction coefficient of amino acid in question  At 280nm the extinction  coefficients are  Cys 120  Tyr 1280 and Trp 5690     This equation is only valid under the following conditions     e pH 6 5  e 6 0 M guanidium hydrochloride    e 0 02 M phosphate buffer    The extinction coefficient values of the three important amino acids at different wavelengths are  found in  Gill and von Hippel  
131.  the algorithm will find a match  if AC occurs in the beginning of the sequence     The symbol   restricts the search to the end of your sequence  For example  if you search  through a sequence with the regular expression GT   the algorithm will find a match if GT  occurs in the end of the sequence     Examples   The expression  ACG   AC G 2  matches all strings of length 4  where the first character is A C  or G and the second is any character except A C and the third and fourth character is G  The  expression G   A   matches all strings of length 3 in the end of your sequence  where the first  character is C  the second any character and the third any character except A     CHAPTER 13  GENERAL SEQUENCE ANALYSES 221    13 7 4 Create motif list    CLC DNA Workbench offers advanced and versatile options to create lists of sequence patterns  or known motifs represented either by a literal string or a regular expression     A motif list is created from the Toolbox     Toolbox   General Sequence Analyses   Create Motif List          This will open an empty list where you can add motifs by clicking the Add     button at the  bottom of the view  This will open a dialog shown in figure 13 28     Add motif    o Simple  E Java  Prosite    Name TATA box    Press Shift   Fi for options    Description   binding site of either transcription Factors or histones    Ken     Figure 13 28  Entering a new motif in the list        In this dialog  you can enter the following information     e Nam
132.  the locations defined in the BLAST database manager  see  section 12 4     e Add the location where your BLAST databases are stored using the BLAST database  manager  see section 12 4   See figure 12 14     12 3 2 Download NCBI pre formatted BLAST databases    Many popular pre formatted databases are available for download from the NCBI  You can  download any of the databases available from the list at ftp    ftp ncbi nih gov blast   db  from within your CLC DNA Workbench     You must be connected to the internet to use this tool   If you choose   or Toolbox   BLAST        Download BLAST Databases    amp       a window like the one in figure 12 11 pops up showing you the list of databases available for  download     e Download BLAST Databases                ize o  Select download location     home joeuser blastdbs       F  X cancel    Figure 12 11  Choose from pre formatted BLAST databases at the NCBI available for download           In this window  you can see the names of the databases  the date they were made available  for download on the NCBI site  the size of the files associated with that database  and a brief  description of each database  You can also see whether the database has any dependencies   This aspect is described below     You can also specify which of your database locations you would like to store the files in  Please  see the Manage BLAST Databases section for more on this  section 12 4      There are two very important things to note if you wish to tak
133.  the option of choosing a translation table  the start codons to use  minimum ORF length as  well as a few other parameters  These choices are explained in this section     To find open reading frames     select a nucleotide sequence   Toolbox in the Menu Bar   Nucleotide Analyses  Ga     Find Open Reading Frames  xx     or right click a nucleotide sequence   Toolbox   Nucleotide Analyses    A    Find Open  Reading Frames  x     This opens the dialog displayed in figure 14 7     If a sequence was selected before choosing the Toolbox action  the sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     If you want to adjust the parameters for finding open reading frames click Next     14 6 1 Open reading frame parameters  This opens the dialog displayed in figure 14 8     The adjustable parameters for the search are     CHAPTER 14  NUCLEOTIDE ANALYSES 235    e Start codon        a  EB Find Open Reading Frames  ES        1  Select nucleotide    Projects  Selected Elements   1           sequences    CLC Data xx ATP8al genomic sequence  HE Example Data  xx  XxX ATP8al mRNA     Cloning    Fe        Primers      Protein analyses   Ej Protein orthologs   5 RNA secondary structure  FJ Sequencing data                4 HE               Qy zenter search term gt  A      s Previous Finish x Cancel                Figure 14 7  Create Reading Frame dialog        r  q Find Open Reading Fra
134.  the plot  Dot plots are one of the oldest methods for comparing two  sequences  Maizel and Lenk  1981      The scores that are drawn on the plot are affected by several issues     e Scoring matrix for distance correction   Scoring matrices  BLOSUM and PAM  contain substitution scores for every combination of  two amino acids  Thus  these matrices can only be used for dot plots of protein sequences     e Window size  The single residue comparison  bit by bit comparison window size   1   in dot plots will  undoubtedly result in a noisy background of the plot  You can imagine that there are many  successes in the comparison if you only have four possible residues like in nucleotide  sequences  Therefore you can set a window size which is smoothing the dot plot  Instead  of comparing single residues it compares subsequences of length set as window size  The  score is now calculated with respect to aligning the subsequences     e Threshold  The dot plot shows the calculated scores with colored threshold  Hence you can better  recognize the most important similarities     Examples and interpretations of dot plots    Contrary to simple sequence alignments dot plots can be a very useful tool for spotting various  evolutionary events which may have happened to the sequences of interest     CHAPTER 13  GENERAL SEQUENCE ANALYSES 205    Below is shown some examples of dot plots where sequence insertions  low complexity regions   inverted repeats etc  can be identified visually     Simil
135.  the processes of character substitution  insertion and deletion   The input to multiple alignment algorithms is a number of homologous sequences i e  sequences  that share a common ancestor and most often also share molecular function  The generated  alignment is a table  see figure 19 16  where each row corresponds to an input sequence and  each column corresponds to a position in the alignment  An individual column in this table  represents residues that have all diverged from a common ancestral residue  Gaps in the table   commonly represented by a      represent positions where residues have been inserted or deleted  and thus do not have ancestral counterparts in all sequences     19 6 1 Use of multiple alignments    Once a multiple alignment is constructed it can form the basis for a number of analyses     e The phylogenetic relationship of the sequences can be investigated by tree building methods  based on the alignment     e Annotation of functional domains  which may only be known for a subset of the sequences   can be transferred to aligned positions in other un annotated sequences     e Conserved regions in the alignment can be found which are prime candidates for holding  functionally important sites     e Comparative bioinformatical analysis can be performed to identify functionally important  regions     19 6 2 Constructing multiple alignments    Whereas the optimal solution to the pairwise alignment problem can be found in reasonable  time  the problem of cons
136.  the server  you can borrow a license   Borrowing a license means that you take one of the floating licenses available on the server and  borrow it for a specified amount of time  During this time period  there will be one less floating  license available on the server     At the point where you wish to borrow a license  you have to be connected to the license server   The procedure for borrowing is this       Click Help   License Manager to display the dialog shown in figure 1 22     Use the checkboxes to select the license s  that you wish to borrow     Select how long time you wish to borrow the license  and click Borrow Licenses     You can now go offline and work with CLC DNA Workbench     Oo BRB WN EB      When the borrow time period has elapsed  you have to connect to the license server again  to use CLC DNA Workbench     6  When the borrow time period has elapsed  the license server will make the floating license  available for other users     Note that the time period is not the period of time that you actually use the Workbench     Note  When your organization   s license server is installed  license borrowing can be turned off   In that case  you will not be able to borrow licenses     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 26    No license available       If all the licenses on the server are in use  you will see a dialog as shown in figure 1 20 when  you start the Workbench        No valid license found   X      XX CLC Network Licensing    The Following pro
137.  the threshold of T limits the search  space significantly     12 5 4 Which BLAST program should   use     Depending on the nature of the sequence it is possible to use different BLAST programs for the  database search  There are five versions of the BLAST program  blastn  blastp  blastx  tblastn   tblastx     Option  Query Type   DE Type   Comparison  noe  Nucleotide Nucleotide Nucleotide Nucleotide      blastp Protein Protein    tblastn  Protein Nucleotide Protein Protein ee database is translated  into protein    blastx   Nucleotide Protein Protein Protein The queries are translated  Bee Mesos jian Motema TS o aaea  tblastx  Nucleotide Nucleotide Protein Protein The queries and database are  Bese Moonee tusome  Peet   rensleted into proton    The most commonly used method is to BLAST a nucleotide sequence against a nucleotide  database  blastn  or a protein sequence against a protein database  blastp   But often another  BLAST program will produce more interesting hits  E g  if a nucleotide sequence is translated       CHAPTER 12  BLAST SEARCH 193    before the search  it is more likely to find better and more accurate hits than just a blastn search   One of the reasons for this is that protein sequences are evolutionarily more conserved than  nucleotide sequences  Another good reason for translating the query sequence before the search  is that you get protein hits which are likely to be annotated  Thus you can directly see the protein  function of the sequenced gene     
138.  titles can be edited simply by clicking with the mouse   These changes will be saved when you Save     the graph   whereas the changes in the Side  Panel need to be saved explicitly  see section 5 6      For more information about the graph view  please see section B     Appendix C    Working with tables    Tables are used in a lot of places in the CLC DNA Workbench  The contents of the tables are of  course different depending on the context  but there are some general features for all tables that  will be explained in the following     Figure C 1 shows an example of a typical table  This is the table result of Find Open Reading  Frames  xx   We will use this table as an example in the following to illustrate the concepts that  are relevant for all kinds of tables           Find reading          Rows  169 Find reading Frame output Filter  Po      a        Column width    Found at strand Start codon  positive AT    negative MMM    Show column   negative TT   positive Tac aah   positive ACC End   negative TAT Length  negative AT   E E CAC Found at strand  positive AGG Start codon    positive Baia eo  postive TTG  negative Estad Deselect All    negative CT  positive faia  negative ala       Figure C 1  A table showing open reading frames     First of all  the columns of the table are listed in the Side Panel to the right of the table  By  clicking the checkboxes you can hide show the columns in the table     Furthermore  you can sort the table by clicking on the column headers
139.  to a license server  Check this option if you wish to use the license server     e Automatically detect license server  By checking this option you do not have to enter more  information to connect to the server     e Manually specify license server  There can be technical limitations which mean that the  license server cannot be detected automatically  and in this case you need to specify more  options manually     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 25       d CLC DNA Workbench       Configure License Server connection       Please choose how you would like to connect to your CLC License server     V  Enable license server connection       Automatically detect license server    Manually specify license server    Port    Disable license borrowing    IF you choose this option  users of this computer will not be able to borrow licenses  From the License Server     If you experience any problems  please contact The CLC Support Team      Proxy Settings     Previous    Finish    Cancel               Figure 1 19  Connecting to a license server         Host name  Enter the address for the licenser server       Port  Specify which port to use     e Disable license borrowing on this computer  If you do not want users of the computer to  borrow a license  see section 1 4 5   you can check this option     Borrow a license    A floating license can only be used when you are connected to the license server  If you wish to  use the CLC DNA Workbench when you are not connected to
140.  to obtain a license for your workbench        Request an Evaluation License    Choose this option if you would like to try out the application for 30 days   Please note that only a single 30 day evaluation license will be allowed For each computer     Download a License    Choose this option if you have a License Order ID and would like to download a License     Import a License from a File    Choose this option if you have a License File on your computer and would like to import it   Upgrade a license from an older Workbench  Choose this option if you have an older version of this workbench with a commercial license   and would like to upgrade your license   Configure License Server Connection    Choose this option if your company or institution is using a central CLC License Server  This  option also enables you to disable a license server connection     if you experience any problems  please contact The CLC Support Team         uimiegide     Frevous   Crea  Cout   Figure 1 1  The license assistant showing you the options for getting started         Gro settngs  o o                         e Configure license server connection  If your organization has a license server  select this  option to connect to the server   Select an appropriate option and click Next     If for some reason you don t have access to getting a license  you can click the Limited Mode  button  see section 1 4 6      1 4 1 Request an evaluation license  We offer a fully functional demo version of CLC DN
141.  to use EcoRV and BamHI  select these two enzymes and    add them to the right side panel   If you wish to use all the enzymes in the list   Click in the panel to the left   press Ctrl   A  38   A on Mac    Add    gt      The enzymes can be sorted by clicking the column headings  i e  Name  Overhang  Methylation  or Popularity  This is particularly useful if you wish to use enzymes which produce e g  a 3     overhang  In this case  you can sort the list by clicking the Overhang column heading  and all the  enzymes producing 3    overhangs will be listed together for easy selection     When looking for a specific enzyme  it is easier to use the Filter  If you wish to find e g  Hindlll  sites  simply type Hindlll into the filter  and the list of enzymes will shrink automatically to only  include the Hindlll enzyme  This can also be used to only show enzymes producing e g  a 3     overhang as shown in figure 18 51     Restriction Site Analysis    1  Select DNA RNA tre o be conero FT cadet  sequence s  Enzyme list    2  Enzymes to be considered opr EA     Te    v  Use existing enzyme list   Popul v  in calculation isting enzyme list   Popular enzymes   us        Enzymes in  Popular en     Enzymes to be used  Filter  a Filter   Name Overhang Methylat    Popul      Name Overhang Methyla    Pop     PstI   taca 5  N   met    tee  KpnI    gtac 5  N   met          SacI   agct 5   5S meth          SphI   catg oo    ggcc 5   5 meth      gt       nnn 5   N4 met           lt N amp  gt  3  
142.  treated as gap extensions and any gaps past 10  are free         End gaps as any other  Gaps at the ends of sequences are treated like gaps in any  other place in the sequences     When aligning a long sequence with a short partial sequence  it is ideal to use free end gaps   since this will be the best approximation to the situation  The many gaps inserted at the ends  are not due to evolutionary events  but rather to partial data     Many homologous proteins have quite different ends  often with large insertions or deletions  This  confuses alignment algorithms  but using the Cheap end gaps option  large gaps will generally  be tolerated at the sequence ends  improving the overall alignment  This is the default setting of  the algorithm     Finally  treating end gaps like any other gaps is the best option when you know that there are no  biologically distinct effects at the ends of the sequences     Figures 19 3 and 19 4 illustrate the differences between the different gap scores at the sequence  ends     19 1 2 Fast or accurate alignment algorithm    CLC DNA Workbench has two algorithms for calculating alignments     e Fast  less accurate   This allows for use of an optimized alignment algorithm which is very  fast  The fast option is particularly useful for data sets with very long sequences     e Slow  very accurate   This is the recommended choice unless you find the processing time  too long     CHAPTER 19  SEQUENCE ALIGNMENT 350    40    20  P49342 1 MNP TETRA MP WS 
143.  two options     e Open  This will open the result of the analysis in a view  This is the default setting     e Save  This means that the result will not be opened but saved to a folder in the Navigation  Area  If you select this option  click Next and you will see one more step where you can  specify where to save the results  See figure 9 6   In this step  you also have the option of    creating a new folder or adding a location by clicking the buttons  w   5  at the top of  the dialog     CHAPTER 9  BATCHING AND RESULT HANDLING 138          a  EB Convert DNA to RNA Eg  1  Select DNA sequences   Savenfodr Eee  2  Result handling toa   49    3  Save in Folder Folder Update All     CLC Data    Example Data  XxX ATP8al genomic sequence  xx  Sis ATPSal     Cloning     Primers       Protein analyses     Protein orthologs     RNA secondary structure     Sequencing data             Qy    lt enter search term gt     Figure 9 6  Specify a folder for the results of the analysis              9 2 1 Table outputs    Some analyses also generate a table with results  and for these analyses the last step looks like  figure 9 7            ci Find Open Reading Frames    1  Select nucleotide  Result handling    sequences       2  Set parameters  3  Result handling    Output options     V  Add annotation to sequence     Z  Create table    Result handling  o Open      Save    Log handling     Make log                    Q       Previous  gt  Next XX Cancel          Figure 9 7  Analyses which al
144.  types  Coverage j U LJ 1 7  0 j  l       gt  Residue coloring               Find Low Coverage         Conflict Conflict    b Sequence layout                    gt  Annotation layout         Alignment info   Fw di  gt  Consensus    gt  Conservation   Trace data  gt  Gap Fraction   b Color different residues   gt  Sequence logo   v Coverage   rim Foreground color  Background color  Fw d2 oe      Graph   Height low v  Trace data    Line plot v    b Paired ends distance   gt  Single paired ends reads   gt  Double matches    rim       Fw d3    Trace data    Figure 2 15  An overview of the contig with the coverage graph     This overview can be an aid in determining whether coverage is satisfactory  and if not  which  regions a new sequencing effort should focus on  Next  we go into the details of the contig     2 5 4 Finding and editing conflicts    Click Zoom to 100     4   to zoom in on the residues at the beginning of the contig  Click the  Find Conflict button at the top of the Side Panel or press the Space key to find the first position  where there is disagreement between the reads  see figure 2 16      In this example  the first read has a  T   marked with a light pink background color   whereas the  second line has a gap  In order to determine which of the reads we should trust  we assess the  quality of the read at this position     A quick look at the regularity of the peaks of read  Rev2  compared to  Rev3  indicates that we  should trust the  Rev2  read  In addition  
145.  view     CHAPTER 15  PROTEIN ANALYSES 239    15 2 Hydrophobicity    CLC DNA Workbench can calculate the hydrophobicity of protein sequences in different ways   using different algorithms   See section 15 2 3   Furthermore  hydrophobicity of sequences  can be displayed as hydrophobicity plots and as graphs along sequences  In addition  CLC  DNA Workbench can calculate hydrophobicity for several sequences at the same time  and for  alignments     15 2 1 Hydrophobicity plot  Displaying the hydrophobicity for a protein sequence in a plot is done in the following way     select a protein sequence in Navigation Area   Toolbox in the Menu Bar   Protein  Analyses  lx    Create Hydrophobicity Plot         This opens a dialog  The first step allows you to add or remove sequences  Clicking Next takes  you through to Step 2  which is displayed in figure 15 3        g Create Hydrophobicity Plot Es          1  Select protein sequences    RSS Ss m    2  Set parameters    Hydrophobicity scale  4  Kyte Doolittle  4  Eisenberg       Engelman   Hopp Woods  Janin  Rose    Cornette    Window size    Number of residues   must be odd  11                      A        Previous     gt  Next   wf Einish   X Cancel      Figure 15 3  Step two in the Hydrophobicity Plot allows you to choose hydrophobicity scale and the  window size              The Window size is the width of the window where the hydrophobicity is calculated  The wider  the window  the less volatile the graph  You can chose from a numbe
146.  will display additional information at the right side of the dialog  This will also  display a button  Download and Install     Click the plug in and press Download and Install  A dialog displaying progress is now shown  and  the plug in is downloaded and installed     If the plug in is not shown on the server  and you have it on your computer  e g  if you have  downloaded it from our web site   you can install it by clicking the Install from File button at the  bottom of the dialog  This will open a dialog where you can browse for the plug in  The plug in file  Should be a file of the type   cpa        When you close the dialog  you will be asked whether you wish to restart the CLC DNA Workbench   The plug in will not be ready for use before you have restarted     1 7 2 Uninstalling plug ins  Plug ins are uninstalled using the plug in manager   Help in the Menu Bar   Plug ins and Resources         or Plug ins   5   in the Toolbar  This will open the dialog shown in figure 1 25     The installed plug ins are shown in this dialog  To uninstall     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 32    Manage Plug ins and Resources         Manage Plug ins Download Plug ins Manage Resources Download Resources       Additional Alignments  O CLC bio   support clcbio com   Version 1 02  Perform alignments with many different programs From within the workbench  ClustalW  Windows Mac Linux   Muscle  Windows Mac Linux    T Coffee  Mac Linux   MAFFT  Mac Linux    Kalign  Mac Linux     Ann
147. 000000000004  Lgt  22 00000000000000000000 000000000 0000004    Figure 2 35  Five lines of dots representing primer suggestions  There is a line for each primer  length   18bp through to 22 bp     2 1 2 Examining the primer suggestions    Each line consists of a number of dots  each representing the starting point of a possible primer   E g  the first dot on the first line  primers of length 18  represents a primer starting at the dot s  position and with a length of 18 nucleotides  shown as the white area in figure 2 36     620       CTATTACCATGGTGATGCGGTTTTGGCAGTAC    LER RE RRR EERE REE ERR ERR ERR RRR SS    Figure 2 36  The first dot on line one represents the starting point of a primer that will anneal to  the highlighted region     Position the mouse cursor over a dot  A box will appear  providing data about this primer  Clicking  the dot will select the region where that primer would anneal   See figure 2 37      CHAPTER 2  TUTORIALS 59        Forward primer region              tt t  tt 6 6  Primer covering postions 612 to 629          GC content  0 5    Melt  temp   58 55   c  Self annealing  10  Self end annealing  3  Secondary structure  9              ZITO  m requirement not met        Figure 2 37  Clicking the dot will select the corresponding primer region  Hovering the cursor over  the dot will bring up an information box containing details about that primer     Note that some of the dots are colored red  This indicates that the primer represented by this  dot d
148. 02 ee eee  Ae  Woes tu ce eae eee ee eee eee Led Ted  4 3 Advanced Searels cane ie bh age eae SER REE Bee we RK A ee  4 4     Search index   score dica   s nd da tae bee eed SS    E  User preferences and settings   5 1 General preferences        0 00 ee ee ee ee  5 2 Default view preferences         00 eee ee ee ee aa  5 3 Data preferences      4 eam  amp  Xe sda ew RS CE eR eRe E  5 4 Advanced preferences     1    a a a  5 5 Export import OT preferences     28 425 0 Hwee ewe eee eR Rw A  5 6 View settings forthe Side Panel        2    ee a  Printing   6 1 Selecting which part of the view to print       0    eee ee ee ee es  Da POC CCD ee saanee te ee eee tw bs tae eee ds ee Dema E a     6 3 Print preview   6a kee bee Dew he ee RE Dee Re Hee Eee Sew A    Import export of data and graphics    fea  Fe  1 3  14    Bioinformatic data formats      External files                Export graphics to files          Export graph data points to a file    15    16  ff  84  91  93  94  95    98   98   99  101  103    104  104  106  108  109  109  110    113  114  115  116    CONTENTS 5    tes Copy paste view output     a ao soa aoao dir SE ee a 130  8 History log 131  SL Element NOM   es aere ee Se sedania ekidi paa ere we E EE E 131  9 Batching and result handling 133  Oe     Beles  serrar ee ee teen eee ee ec eae eee ewe eee 133  9 2 Howto handle results of analyses     1    ee a 136  Ill Bioinformatics 140  10 Viewing and editing sequences 141  10 1 View sequence    noaoo a a a a a ra 1
149. 1 pixels   696 MB memory usage                eee  ore  Xora     Figure 7 12  Parameters for bitmap formats  size of the graphics file              You can adjust the size  the resolution  of the file to four standard sizes     Screen resolution    e Low resolution    Medium resolution    e High resolution    CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 128    The actual size in pixels is displayed in parentheses  An estimate of the memory usage for  exporting the file is also shown  If the image is to be used on computer screens only  a low  resolution is sufficient  If the image is going to be used on printed material  a higher resolution  iS necessary to produce a good result     Parameters for vector formats    For pdf format  clicking Next will display the dialog shown in figure 7 13  this is only the case if  the graphics is using more than one page         q Export Graphics Es     1  Output options   iG  sasas   2  Save in file  3  Page setup    Page setup parameters    Orientation  Portrait  Paper Size  A4  Horizontal Pagecount  Not Applicable  Vertical Pagecount  Not Applicable  Header Text   Footer Text   Show Pagenumber  Yes         E   Page Setup         ener  Si   ee  CXe    Figure 7 13  Page setup parameters for vector formats                 The settings for the page setup are shown  and clicking the Page Setup button will display a  dialog where these settings can ba adjusted  This dialog is described in section 6 2     The page setup is only available if you
150. 10  Gap extension cost  1    End gap cost  As any other w    Alignment  O Fast  less accurate   Slow  very accurate   Redo alignments    Use Fixpoints    GOS Cees  dm   um   Xena     Figure 19 2  Adjusting alignment algorithm parameters                    CHAPTER 19  SEQUENCE ALIGNMENT 349    19 1 1 Gap costs    The alignment algorithm has three parameters concerning gap costs  Gap open cost  Gap  extension cost and End gap cost  The precision of these parameters is to one place of decimal     e Gap open cost  The price for introducing gaps in an alignment     e Gap extension cost  The price for every extension past the initial gap     If you expect a lot of small gaps in your alignment  the Gap open cost should equal the Gap  extension cost  On the other hand  if you expect few but large gaps  the Gap open cost should  be set significantly higher than the Gap extension cost     However  for most alignments it is a good idea to make the Gap open cost quite a bit higher  than the Gap extension cost  The default values are 10 0 and 1 0 for the two parameters   respectively     e End gap cost  The price of gaps at the beginning or the end of the alignment  One of the  advantages of the CLC DNA Workbench alignment method is that it provides flexibility in the  treatment of gaps at the ends of the sequences  There are three possibilities         Free end gaps  Any number of gaps can be inserted in the ends of the sequences  without any cost         Cheap end gaps  All end gaps are
151. 12 5 5 Which BLAST options should I change     The NCBI BLAST web pages and the BLAST command line tool offer a number of different options  which can be changed in order to obtain the best possible result  Changing these parameters  can have a great impact on the search result  It is not the scope of this document to comment  on all of the options available but merely the options which can be changed with a direct impact  on the search result     The E value    The expect value E value  can be changed in order to limit the number of hits to the most  significant ones  The lower the E value  the better the hit  The E value is dependent on the length  of the query sequence and the size of the database  For example  an alignment obtaining an  E value of 0 05 means that there is a 5 in 100 chance of occurring by chance alone     E values are very dependent on the query sequence length and the database size  Short identical  sequence may have a high E value and may be regarded as  false positive  hits  This is often  seen if one searches for short primer regions  small domain regions etc  The default threshold  for the E value on the BLAST web page is 10  Increasing this value will most likely generate more  hits  Below are some rules of thumb which can be used as a guide but should be considered  with common sense     e E value  lt  10e 100 Identical sequences  You will get long alignments across the entire  query and hit sequence     e 10e 100  lt  E value  lt  10e 50 Almost id
152. 13  A horizontal split screen  The two views split the View Area     Maximize Restore size of view    The Maximize  Restore View function allows you to see a view in maximized mode  meaning a  mode where no other views nor the Navigation Area is shown     Maximizing a view can be done in the following ways   select view   Ctrl   M  or select view   View   Maximize restore View        or select view   right click the tab   View   Maximize restore View    1   or double click the tab of view  The following restores the size of the view   Ctrl   M  or View   Maximize restore View          or double click title of view    CHAPTER 3  USER INTERFACE 90             agt PERDAS O    ser 68225 O    ae   P6s053 O  act P68046                P68225 VDEVGGEALI          P68046 DEVGGEALGF    PF68225 RLLVVYTPWTI  Pest46 LLVVYPWT OF       P68225 RFFESFGDL              P8046 HEEE   e i      Figure 3 14  A vertical split screen     9 CLC Dna Workbench 3 0  Current workspace  Default   File Edit Search View Toolbox Workspace Help    ME AO S SA CC ol EE ed   ZO    Show New Import Export Graphics Print Workspace Search Fit Width 100  Pan  SEM Zoom In Zoom Out  FEE protein align             P68053  MHBTCEEKA aA TaABWcKYN WDENcc  nTc 29    P68225 MMHBTPEEKN ANTTENGKEN  gt a  P68873 E ANTA Ewe KEN   Sequence layout  P68228 G AUHciEwsKEK es  p68231 MMHBScCBEKN AWHcBWSKUK WDENccEAEc 30 jae Ines  P6s063  MAWTABEKQ ENTCENcKEN MABCcCABABA 29 H  P68945  MHWTABEKQ ElTcENcKEN BaADccABABaA 29 O No wrap    Conse
153. 16 5210773 5210901 105 147 sa Hit start  3 06E 50 5212141 5212239 1 33 sa Hit end    1 05E 39 5232095 5232322 31 106 78  2 58E 31 5247257 5247484 31 106 75  _  Hit length J  OF 20 aT  E   Es Query start  Query end   C  Identity bad                   Figure 2 46  Placement of translated nucleotide sequence hits on the Human beta globin     1 000 5 203 500 5 204 000 5 204 500 5 205 000             HBB   HBE  HEB    IT    NC 000011 58  20 ti  Figure 2 47  Human beta globin exon view     e Use BLASTx    e Use the protein sequence  AAA16334  as database    Using the genomic sequence as query  the mapping of the protein sequence to the exons is  visually very clear as shown in figure 2 48     In theory you could use the chromosome sequence as query  but the performance would not be  optimal  it would take a long time  and the computer might run out of memory     In this example  you have used well annotated sequences where you could have searched for  the name of the gene instead of using BLAST  However  there are other situations where you    CHAPTER 2  TUTORIALS 67        EE MC_000011 sel   E      gt           000011 selection      _   _           IRD ID O reverse A  IRD ID O reverse o    IRD ID O reverse        RD ID O reverse _  RD ID O reverse E      lt     rn  ree           m    i   gt   El Op     EE ME 00001 sel             000011 selection  GG CAGACTTCTCCTCAGGAGTCAGATGCACCATGGTGTC    RD ID O reverse Ala Ser Lys Glu Glu Pro Thr Leu His Val Met   RD IDO reverse   RD ID O reve
154. 1989      Knowing the extinction coefficient  the absorbance  optical density  can be calculated using the  following formula     Ext Protei  Absorbance  Protein    Ri fio RA  olecular weig    Two values are reported  The first value is computed assuming that all cysteine residues appear  as half cystines  meaning they form di sulfide bridges to other cysteines  The second number  assumes that no di sulfide bonds are formed     Atomic composition    Amino acids are indeed very simple compounds  All 20 amino acids consist of combinations of  only five different atoms  The atoms which can be found in these simple structures are  Carbon   Nitrogen  Hydrogen  Sulfur  Oxygen  The atomic composition of a protein can for example be  used to calculate the precise molecular weight of the entire protein     Total number of negatively charged residues  Asp Glu     At neutral pH  the fraction of negatively charged residues provides information about the location  of the protein  Intracellular proteins tend to have a higher fraction of negatively charged residues  than extracellular proteins     Total number of positively charged residues  Arg Lys     At neutral pH  nuclear proteins have a high relative percentage of positively charged amino acids   Nuclear proteins often bind to the negatively charged DNA  which may regulate gene expression or  help to fold the DNA  Nuclear proteins often have a low percentage of aromatic residues  Andrade  et al   1998      Amino acid distribution    Am
155. 2 1 1 Creating a a folder    When CLC DNA Workbench is started there is one element in the Navigation Area called  CLC Data     This element is a Location  A location points to a folder on your computer where  your data for use with CLC DNA Workbench is stored        lf you have downloaded the example data  this will be placed as a folder in CLC Data    CHAPTER 2  TUTORIALS 38    g CLC Dna Workbench 3 0  Current workspace  Default  Sele  File Edit Search View Toolbox Workspace Help    FEAT BBs sis te ES viv olay ip eae E  Ea PERRY X WD     Show New Import Expor Paste Workspace Search Pan SPS  Zoom In Zoom Out      gt  Example data   4  Nucleotide  iH  Protein   gt  Extra      README       i  Recycle bin  1           a Alignments and Trees   KA General Sequence Analyses  KA Nucleotide Analyses   ga  Protein Analyses   mA Sequencing Data Analyses     al Primers and Probes   tag Cloning and Restriction Sites    BLAST Search    8h Database Search    bl             EE RE          Processes   Toolbox  Ta    Idle    1 elementis  are selected       Figure 2 1  The user interface as it looks when you start the program for the first time   Windows  version of CLC DNA Workbench  The interface is similar for Mac and Linux      The data in the location can be organized into folders  Create a folder     File   New   Folder  H   or Ctrl   Shift   N  36   Shift   N on Mac     Name the folder    My folder    and press Enter     2 1 2 Import data    Next  we want to import a sequence called HU
156. 3  GENERAL SEQUENCE ANALYSES 224    q  L       At the top  select a motif list by clicking the Browse  py  button  When the motif list is selected   its motifs are listed in the panel in the left hand side of the dialog  The right hand side panel  contains the motifs that will be listed in the Side Panel when you click Finish     13 7 2 Motif search from the Toolbox    The dynamic motifs described in section 13 7 1 provide a quick way of routinely scanning a  sequence for commonly used motifs  but in some cases a more systematic approach is needed   The motif search in the Toolbox provides an option to search for motifs with a user specified  similarity to the target sequence  and furthermore the motifs found can be displayed in an  overview table  This is particularly useful when searching for motifs on many sequences     To start the Toolbox motif search   Toolbox   General Sequence Analyses  171    Motif Search  4     Use the arrows to add or remove sequences or sequence lists from the selected elements     You can perform the analysis on several DNA or several protein sequences at a time  If the  analysis is performed on several sequences at a time the method will search for patterns in the  sequences and create an overview table of the motifs found in all sequences     Click Next to adjust parameters  see figure 13 26         q Motif Search Es    1  Select one or more  sequences of same type    2  Set parameters  Motif parameters       Simple motif  Java regular expressi
157. 4              Some enzymes cut the sequence twice for each recognition site  and in this case  the two cut positions are surrounded by parentheses     Table of restriction fragments    The restriction map can be shown as a table of fragments produced by cutting the sequence with  the enzymes     Click the Fragments button      at the bottom of the view  The table is shown in see figure 18 47     Each row in the table represents a fragment  If more than one enzyme cuts in the same region   or if an enzyme   s recognition site is cut by another enzyme  there will be a fragment for each of  the possible cut combinations     The following information is available for each fragment     e Sequence  The name of the sequence which is relevant if you have performed restriction  map analysis on more than one sequence     e Length  The length of the fragment  If there are overhangs of the fragment  these are  included in the length  both 3    and 5    overhangs      e Region  The fragment   s region on the original sequence       Furthermore  if this is the case  you will see the names of the other enzymes in the Conflicting Enzymes column    CHAPTER 18  CLONING AND CUTTING 340       EH Restriction m    E                  Rows  9 Restriction Fragment table Filter       Sequence   Length Region overhangs Leftend Rightend Conflicting enzymes    FERHABT 100 154 Tsol    100  151    133 151   146   184   179   196                               Figure 18 47  The result of the restriction ana
158. 41    409    Chain Flexibility  242  Cornette  146  242  Eisenberg  146  242  Emini  146   Engelman  GES   146  241  Hopp Woods  146  242  Janin  146  242   Karplus and Schulz  146  Kolaskar Tongaonkar  146  242  Kyte Doolittle  146  241  Rose  242   Surface Probability  242  Welling  146  242    ID  license  19  Illumina Genome Analyzer  376  Import  bioinformatic data  118  119  existing data  38  FASTA data  38  from a web page  119  list of formats  392  preferences  109  raw sequence  119  Side Panel Settings  107  using copy paste  119  In silico PCR  2 1  Index for searching  103  Infer Phylogenetic Tree  366  Information point  primer design  250  Insert  gaps  35 7  Insert restriction site  318  Installation  12  Invert sequence  232  Isoelectric point  215  Isoschizomers  334  IUPAC codes  nucleotides  398    Join  alignments  359  sequences  218  Jpg format  export  126    Keywords  160    Label  of sequence  142  Landscape  Print orientation  115    INDEX    Lasergene sequence   file format  393  Latin name   batch edit  84  Length  160  License  15   ID  19   starting without a license  27  License server  24  License server  access offline  25  Limited mode  27  Links  from annotations  158  Linux   installation  14   installation with RPM package  15  List of restriction enzymes  343  List of sequences  163  Load enzyme list  330  Local BLAST  178  Local BLAST Database  187  Local BLAST database management  188  Local BLAST Databases  185  Local complexity plot
159. 41  W CKU DNA ee te ke ER eR EERE EERE RESET EERE OG 150  10 3 Working with annotations        2 0    2 ee ee 152  10 4 Element information 2 25 s 2  64b44 be eee aw    bo aw ee ELES ee we 160  10 5 View as text   eh ee ee ee ee Se eS oe eae ee eee ow we ae ee    161  10 6 Creating anew sequence       1    a a a a a a a ra 162  10 7 Sequence Lists        0    ce ra 163  11 Online database search 167  11 1 GenBank search       2    a a a 16 7  11 2 Sequence web info cc ese  amp  Bee Sw ee ew ewe we BEES we SEE E 170  12 BLAST Search 173  12L Ruming ELI ass s et ieee eee te ee eee ee ee ee eX 174  12 2 Output from BLAST searcheS    4 iin dace we Gud db td ae wou GSEs 180  12 3 Local BLAST databases 6444424 Sa wb ee ee Oe Se WO we Oe ee 185  12 4 Manage BLAST databases    4 5244  8 ke wee Ee ee ES SE HES wu A 188  12 5 Bioinformatics explained  BLAST          00000 eee ee ee ee 189  13 General sequence analyses 199  13 1 Shuffle sequence  2t ieee a a kit beeGae ee Eee Lee eRe HO 199  13 2 WOUDIOG     lt   eo tues baw eee repor dd wo a ee a 201    13 3 Localcomplexity plot gia cuacevwiaceie ahaa we baw EEE ces 211    CONTENTS    13 4  13 5  13 6  ie    Sequence statistics         Join sequences           Pattern Discovery            Motif Search               14 Nucleotide analyses    14 1  14 2  14 3  14 4  14 5  14 6    Convert DNA to RNA         Convert RNA to DNA           Reverse complements of sequences          0 0 00 0 eee eee ee ee    Reverse sequence            Translat
160. 5 6 View settings for the Side Panel         2 2 2 eee ee ee ee 110  5 6 1 Floating Side Panel sussa be ol wee ae ewe ee we ee a 112    The first three sections in this chapter deal with the general preferences that can be set for CLC  DNA Workbench using the Preferences dialog  The next section explains how the settings in the  Side Panel can be saved and applied to other views  Finally  you can learn how to import and  export the preferences     The Preferences dialog offers opportunities for changing the default settings for different features  of the program   The Preferences dialog is opened in one of the following ways and can be seen in figure 5 1     Edit   Preferences  1 55     or Ctrl   K  36    on Mac     5 1 General preferences    The General preferences include     104    CHAPTER 5  USER PREFERENCES AND SETTINGS 105          EB Preferences  e        Undo limit    500    EE   Audit Support    Enable audit of manual sequence modifications    Number of hits  normal search   50  Number of hits  NCBI Uniprot   50    Style  English  United States     Show all dialogs with  Never show this dialog again       Show Dialogs                    Help     Jf OK     X Cancel     Export     Import            Figure 5 1  Preferences include General preferences  View preferences  Colors preferences  and  Advanced settings     Undo Limit  As default the undo limit is set to 500  By writing a higher number in this field   more actions can be undone  Undo applies to all changes made 
161. 6      Feng and Doolittle  1987  Feng  D  F  and Doolittle  R  F   1987   Progressive sequence align   ment as a prerequisite to correct phylogenetic trees  J Mol Evol  25 4  351 360      Forsberg et al   2001  Forsberg  R   Oleksiewicz  M  B   Petersen  A  M   Hein  J   Botner  A   and  Storgaard  T   2001   A molecular clock dates the common ancestor of European type porcine  reproductive and respiratory syndrome virus at more than 10 years before the emergence of  disease  Virology  289 2  1 74 179      Gill and von Hippel  1989  Gill  S  C  and von Hippel  P  H   1989   Calculation of protein  extinction coefficients from amino acid sequence data  Anal Biochem  182 2  319 326      Gonda et al   1989  Gonda  D  K   Bachmair  A   Wunning  l   Tobias  J  W   Lane  W  S    and Varshavsky  A   1989   Universality and structure of the N end rule  J Biol Chem   264 28  16700 16712      Guindon and Gascuel  2003  Guindon  S  and Gascuel  O   2003   A Simple  Fast  and Accu   rate Algorithm to Estimate Large Phylogenies by Maximum Likelihood  Systematic Biology   52 5  696 704      Hasegawa et al   1985  Hasegawa  M   Kishino  H   and Yano  T   1985   Dating of the human   ape splitting by a molecular clock of mitochondrial DNA  Journal of Molecular Evolution   22 2  160 174      Hein  2001  Hein  J   2001   An algorithm for statistical alignment of sequences related by a  binary tree  In Pacific Symposium on Biocomputing  page 179      Hein et al   2000  Hein  J   Wiuf  C   Knuds
162. 6     20 1 20         l  tlA CTTTTCAAGG AGTATTTCCT ATGAACGAGT TAGACGGCAT  evgA CATTGCAAAG GGAATAATCT ATGAACGCAA TAATTATTGA  ypdl CATTTTCAGG ATAACTTTCT ATGAAAGTAA ACTTAATACT  nrB GAAAAGAAAT CGAGGCAAAA ATGAGCAAAG TCAGACTCGC  hmpA TGCAAAAAAA GGAAGACCAT ATGCTTGACG CTCAAACCAT  narQ TTTTTGTGGA GAAGACGCGT GTGATTGTTA AACGACCCGT  gtf GTTATTAAGG ATATGTTCAT ATGTTTTTCA AAAAGAACCT  intS TACCCACCGG ATTTTTACCC ATGCTCACCG TTAAGCAGAT  yidF AATCAAAATG GAATAAAATC ATGCTACCAT CTATTTCAAT  dsdX ATCACAGGGG AAGGTGAGAT ATGCACTCTC AAATCTGGGT  sunB ACATCCAGTG AGAGAGACCG ATGCATCCGA TGCTGAACAT    Consensus AATTTAAAGG AGAATTACCT ATGAACGCAA TAATAAACAT    Sequence Logo     lt 8  RABG faha  xea    ASt ea 8 58 x efl    Conservation    i i         o oa eta coca    Figure 19 8  Ungapped sequence alignment of eleven E  coli sequences defining a start codon   The start codons start at position 1  Below the alignment is shown the corresponding sequence  logo  As seen  a GTG start codon and the usual ATG start codons are present in the alignment  This  can also be visualized in the logo at position 1     Calculation of sequence logos    A comprehensive walk through of the calculation of the information content in sequence logos  is beyond the scope of this document but can be found in the original paper by  Schneider and  Stephens  1990   Nevertheless  the conservation of every position is defined as Rse  which is  the difference between the maximal entropy  Smar  and the observed entropy for the residue  distributi
163. 601   4  R  133  401 che  133   1615 Fragment length  133  1601 zer    Other Fragments     Fwd  primer Melt  temp   C  Rev  primer Melt  temp    C  Giff  Melt  temp     Select All  Deselect All    Figure 16 21  A table showing all possible fragments of the specified size     CHAPTER 16  PRIMERS 2 5    The table first lists the names of the forward and reverse primers  then the length of the fragment  and the region  The last column tells if there are other possible fragments fulfilling the length  criteria on this sequence  This information can be used to check for competing products in the  PCR  In the Side Panel you can show information about melting temperature for the primers as  well as the difference between melting temperatures     You can use this table to browse the fragment regions  If you make a split view of the table and  the sequence  see section 3 2 6   you can browse through the fragment regions by clicking in the  table  This will cause the sequence view to jump to the start position of the fragment     There are some additional options in the fragment table  First  you can annotate the fragment on  the original sequence  This is done by right clicking  Ctrl click on Mac  the fragment and choose  Annotate Fragment as shown in figure 16 22     Rows  7 Fragments Filter          1d Rey Fragment length Region v Other F     imer 3 primer 2 1486 1575  3062   imer  primer 5   HindIII ee 5 151  1615      imer 6 primer 5 Open Fragment 51 151  1601      imer 6  Ecok pr
164. 7  48E 66 243 817  1DXL B Chain B  Hemoglobin  Deos  Mutant With Val 1 Replac    7 48E 66 243 817       Download and Open Download and Save Open at MEBI Open Structure  l  ES     Of    Figure 12 10  Display of the output of a BLAST search in the tabular view  The hits can be sorted  by the different columns  simply by clicking the column heading     Figure 12 10 is an example of a BLAST Table     The BLAST Table includes the following information     Query sequence  The sequence which was used for the search   Hit  The Name of the sequences found in the BLAST search   Id  GenBank ID    Description  Text from NCBI describing the sequence     E value  Measure of quality of the match  Higher E values indicate that BLAST found a less  homologous sequence     Score  This shows the score of the local alignment generated through the BLAST search     Bit score  This shows the bit score of the local alignment generated through the BLAST  search  Bit scores are normalized  which means that the bit scores from different alignments  can be compared  even if different scoring matrices have been used     Hit start  Shows the start position in the hit sequence   Hit end  Shows the end position in the hit sequence    Hit length  The length of the hit    Query start  Shows the start position in the query sequence   Query end  Shows the end position in the query sequence     Overlap  Display a percentage value for the overlap of the query sequence and hit sequence   Only the length of the loca
165. 8al primers    Filter  All a         Show column A    Score     Pair annealing align  Fwd  Rev  Fragment length    Sequence Fwd Melt  temp  Fwd Sequence Rev Melt  temp  Rev Fis  core                Pair annealing  Fwd Rev   Open Primer s  Fwd  Rev            Pair annealing align  Fwd Rev         Save Primer s  Fwd  Rev    Pair end annealing  Fwd Rev        Mark Primer Annotation on Sequence    o F  Z  Fragment length  Fwd  Rev   pen Fragment 48 572 GGAACTGAGAATAGAGGA 49 566       GGTGGGAGGTCTATATAA  57 873   It Ul  AGGAGATAAGAGTCAAGG    Op Y    Figure 2 41  The options available in the right click menu  Here   Mark primer annotation on  sequence    has been chosen  resulting in two annotations on the sequence above  labeled  Oligo               Sequence Fwd as       Save Fragment          Workbench user manual  linked to on this webpage  http    www clcbio com download     2 8 Tutorial  BLAST search    BLAST is an invaluable tool in bioinformatics  It has become central to identification of  homologues and similar sequences  and can also be used for many other different purposes   This tutorial takes you through the steps of running a blast search in CLC Workbenches  If  you plan to use blast for your research  we highly recommend that you read further about it   Understanding how blast works is key to setting up meaningful and efficient searches     Suppose you are working with the ATP8a1 protein sequence which is a phospholipid transporting  ATPase expressed in the adult ho
166. 92  and PAM  Dayhoff  and Schwartz  1978      CHAPTER 13  GENERAL SEQUENCE ANALYSES 209         oa PRPPPPCLOLOLPELA       4  a  Ed PPPPPRPEIPEE EP LE OL  p  Ly V   PFFF RN Fa i  Pa     Pi    a a le tae te    1 Peete na r    A  j cw Yi                   N     s     Se as     _   _   e     AAAI  a  ua               A    ti   hy  Se e  a   dd    N    N sr  A     Ma       B  Mey  k        ia        PERESELESEPLIPE      Er re ee    Figure 13 12  The dot plot A a low complexity region in the sequence  The sequence is  artificial and low complexity regions does not always show as a square     Different scoring matrices    PAM   The first PAM matrix  Point Accepted Mutation  was published in 1978 by Dayhoff et al  The PAM  matrix was build through a global alignment of related sequences all having sequence similarity  above 85   Dayhoff and Schwartz  1978   A PAM matrix shows the probability that any given  amino acid will mutate into another in a given time interval  As an example  PAM1 gives that one  amino acid out of a 100 will mutate in a given time interval  In the other end of the scale  a  PAM256 matrix  gives the probability of 256 mutations in a 100 amino acids  see figure 13 13      There are some limitation to the PAM matrices which makes the BLOSUM matrices somewhat  more attractive  The dataset on which the initial PAM matrices were build is very old by now  and  the PAM matrices assume that all amino acids mutate at the same rate   this is not a correct  assumption    
167. A MN HM AA  O O O O O O O AO O AO AO wo oO       E          Figure 18 39  Enzymes with compatible ends     At the top you can choose whether the enzymes considered should have an exact match or not   Since a number of restriction enzymes have ambiguous cut patterns  there will be variations in  the resulting overhangs  Choosing All matches  you cannot be 100  sure that the overhang will  match  and you will need to inspect the sequence further afterwards     We advice trying Exact match first  and use All matches as an alternative if a satisfactory result  cannot be achieved     CHAPTER 18  CLONING AND CUTTING 335    At the bottom of the dialog  the list of enzymes producing compatible overhangs is shown  Use  the arrows to add enzymes which will be displayed on the sequence which you press Finish     When you have added the relevant enzymes  click Finish  and the enzymes will be added to the  Side Panel and their cut sites displayed on the sequence     18 3 2 Restriction site analysis from the Toolbox    Besides the dynamic restriction sites  you can do a more elaborate restriction map analysis with  more output format using the Toolbox     Toolbox   Cloning and Restriction Sites  is    Restriction Site Analysis         This will display the dialog shown in figure 18 40        f  g Restriction Site Analysis X     m  1  Select DNA RNA  E S  sequence s  Projects  Selected Elements   1       CLC_Data   x ATP8a1 mRNA    gt  Example Data  XX ATP8al genomic sequence  xx       Clon
168. A Workbench to all users  free of charge     Each user is entitled to 30 days demo of CLC DNA Workbench  If you need more time for  evaluating  another two weeks of demo can be requested     We use the concept of  quid quo pro     The last two weeks of free demo time given to you is  therefore accompanied by a short form questionnaire where you have the opportunity to give us  feedback about the program     The 30 days demo is offered for each major release of CLC DNA Workbench  You will therefore  have the opportunity to try the next major version when it is released   If you purchase CLC DNA  Workbench the first year of updates is included      When you select to request an evaluation license  you will see the dialog shown in figure 1 2     In this dialog  there are two options     e Direct download  The workbench will attempt to contact the online CLC Licenses Service   and download the license directly  This method requires internet access from the workbench     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 1 7       License Wizard EA     d CLC DNA Workbench       Request an evaluation license       Please choose how you would like to request an evaulation license      Direct Download  The workbench will attempt to contact the CLC Licenses Service  and download the license    directly   This method requires internet access from the workbench     Go to License Download web page    The workbench will open a Web Browser with the License Download web page  From there you  will
169. A and  subsequent a translation to proteins occur  This is of course simplified but is in general what is  happening in order to have a steady production of proteins needed for the survival of the cell  In  bioinformatics analysis of proteins it is sometimes useful to know the ancestral DNA sequence  in order to find the genomic localization of the gene  Thus  the translation of proteins back to  DNA RNA is of particular interest  and is called reverse translation or back translation     The Genetic Code    In 1968 the Nobel Prize in Medicine was awarded to Robert W  Holley  Har Gobind Khorana and  Marshall W  Nirenberg for their interpretation of the Genetic Code  http    nobelprize org   medicine laureates 1968    The Genetic Code represents translations of all 64 different  codons into 20 different amino acids  Therefore it is no problem to translate a DNA RNA  sequence into a specific protein  But due to the degeneracy of the genetic code  several codons  may code for only one specific amino acid  This can be seen in the table below  After the discovery  of the genetic code it has been concluded that different organism  and organelles  have genetic  codes which are different from the  standard genetic code   Moreover  the amino acid alphabet  is no longer limited to 20 amino acids  The 21   st amino acid  selenocysteine  is encoded by an   UGA    codon which is normally a stop codon  The discrimination of a selenocysteine over a stop  codon is carried out by the translati
170. A4 TO sequence   View   Split Horizontally           CHAPTER 2  TUTORIALS 99    Note that this can also be achieved by simply dragging the pcDNA4_TO sequence into the lower  part of the open view     Switch to the Circular      view at the bottom of the view     Zoom in  5  on the multiple cloning site downstream of the green CMV promoter annotation   You should now have a view similar to the one shown in figure 2 24                          sob ATPBal mRNA     Pstl    Sequence settings E  jAtp8a1 hi P  Pstl Smal PstI  Atp8a1    EcoRV  Bgill EcoRI  EcoRV v Sequence layout  st COR amH Stl BamHI Spacing    No spacing X  ATP8a1 MRNA No wrap     Auto wrap  Fixed wrap   MOBEA   SHeOY       poonas TO       a                gt  Annotation layout   gt  Annotation types   gt  Restriction sites     gt  Motifs     gt  Find    b Text Format             uz   amp  Ep Of  Y    Figure 2 24  Check cut sites        co OB      By looking at the enzymes we can see that both Hindlll and Xhol cut in the multi cloning site of  the vector and not in the AtpSa1 gene  Note that you can add more enzymes to the list in the  Side Panel by clicking Manage Enzymes under the Restriction Sites group     Close both views and open the ATP8al fwd primer sequence  When it opens  double click the  name of the sequence to make a selection of the full sequence  If you do not see the whole  sequence turn purple  please make sure you have the Selection Tool chosen  and not one of the  other tools available from the t
171. APTER 2  TUTORIALS 10    It is possible to add and remove sequences from Selected Elements list  Since we had already  selected the eight proteins  just click Next to adjust parameters for the alignment     Clicking Next opens the dialog shown in figure 2 53        a  q Create Alignment   1  Select sequences of same MES amete  type       2  Set parameters    Gap settings  Gap open cost  10  Gap extension cost  1    End gap cost  As any other w    Alignment     Fast  less accurate   Slow  very accurate   Redo alignments    Use Fixpoints             PERES         Previous   gt  Next XX Cancel    Figure 2 53  The alignment dialog displaying the available parameters which can be adjusted           Leave the parameters at their default settings  An explanation of the parameters can be found  by clicking the help button        Alternatively  a tooltip is displayed by holding the mouse cursor  on the parameters     Click Finish to start the alignment process which is shown in the Toolbox under the Processes  tab  When the program is finished calculating it displays the alignment  see fig  2 54      FEE ATPase protei             o     _  Alignment Settin Hans k     ig  conta nperi  pieredies eee sorttssrro mS  p39524 M       BORET PPKRKPGEDO THE            HOGER     a    504295 MARBWDNKGN AKRISRDEDE BEEAGESwic RTEONPRECE   2 Every 10 residues    P57792 MA T   PARR S s GRRRKR 22 Uwe ee eee ee eS  a ee Se No wrap  Q9sx33 Miles           c TKRRRR                             E  Consensus
172. ATSB1_ HUMAN    e i   i M  2G3 AT11B HUMAN   gt  ee ee eee IMC  SI9BJATIIA HUMAN mm ee IM MM Eu IMM  JB49 AT HC  HUMAN     e 4 1 1 a ee  DA23 AT8B3 HUMAN ce           si t   0423 AT8B3_HUMAN        SMOATPIA HUMAN e e e e e O e e DD    DO2CA4lA TMA ULIRAARI E  Peg ii             amp  OO          i         4 Th t  PREY    Figure 2 44  Output of a BLAST search  By holding the mouse pointer over the lines you can get  information about the sequence                             Try placing your mouse cursor over a potential homologous sequence  You will see that a context  box appears containing information about the sequence and the match scores obtained from the  BLAST algorithm     The lines in the BLAST view are the actual sequences which are downloaded  This means that  you can zoom in and see the actual alignment     Zoom in in the Tool Bar  45    Click in the BLAST view a number of times until you   see the residues  Now we will focus our attention on sequence 09Y200   the BLAST hit that is at the top of the  list  To download the full sequence     right click the line representing sequence Q9Y2Q0   Download Full Hit Sequence   from NCBI  This opens the sequence  However  the sequence is not saved yet  Drag and drop the sequence  into the Navigation Area to save it  This homologous sequence is now stored in the CLC DNA  Workbench and you can use it to gain information about the query sequence by using the various  tools of the workbench  e g  by studying its textual informat
173. ATTTAAAC AGATGGTGTT TGCTTATTCC Min    PERH3BC O TTCTAGGGAG CAGTTTAGAT GGAAGGTATC TGCTTGTICC b Advanced parameters    Consensus TTCTAGGGAG NNNTTTANAN NGANGGTNTN TGCTTNTTCC Mode    mor TTCTAGGGAG cos TT Modo oGdeGGTals TTTeTICG Omara        TaqMan       PERH2BD O CCCATGGAAT GCGGA  AGA GTTTGATTGT TTTACCCTCC Primer solution  PERH3BC O CCCATGGAGT GCTGACAAGA GTTTGGTTAT TTTACTCTCC   gt   Perfect match  Consensus CCCATGGANT GCNGACAAGA GTTTGHTTHT TTTACHCTCC    mare QNTO GteGhcaMGh GTM To  TTTAGSCICG ww    tes PES LO  Figure 16 12  The initial view of an alignment used for primer design     CHAPTER 16  PRIMERS 266    16 9 1 Specific options for alignment based primer and probe design    Compared to the primer view of a single sequence the most notable difference is that the  alignment primer view has no available graphical information  Furthermore  the selection boxes  found to the left of the names in the alignment play an important role in specifying the oligo  design process  This is elaborated below  The Primer Parameters group in the Side Panel has the  same options for specifying primer requirements  but differs by the following  see figure 16 12      e In the Mode submenu which specifies the reaction types the following options are found         Standard PCR  Used when the objective is to design primers  or primer pairs  for PCR  amplification of a single DNA fragment         TaqMan  Used when the objective is to design a primer pair and a probe set for  TaqMan quantitative PCR 
174. Aspartic acid  3 50 3 00  3 10  0 90 0 62  0 60  9 20  E Glutamic acid  3 50 3 00  1 80  0 74 0 62  0 70  8 20  F Phenylalanine 2 80  2 50 4 40 1 19 0 88 0 50 3 70  G Glycine  0 40 0 00 0 00 0 48 0 72 0 30 1 00  H Histidine  3 20  0 50 0 50  0 40 0 78  0 10  3 00    Isoleucine 4 50  1 80 4 80 1 38 0 88 0 70 3 10  K Lysine  3 90 3 00  3 10  1 50 0 52  1 80  8 80  L Leucine 3 80  1 80 5 70 1 06 0 85 0 50 2 80  M Methionine 1 90  1 30 4 20 0 64 0 85 0 40 3 40  N Asparagine  3 50 0 20  0 50  0 78 0 63  0 50  4 80  P Proline  1 60 0 00  2 20 0 12 0 64  0 30  0 20  Q Glutamine  3 50 0 20  2 80  0 85 0 62  0 70  4 10  R Arginine  4 50 3 00 1 40  2 53 0 64  1 40  12 3  S Serine  0 80 0 30  0 50  0 18 0 66  0 10 0 60  T Threonine  0 70  0 40  1 90  0 05 0 70  0 20 1 20  V Valine 4 20  1 50 4 70 1 08 0 86 0 60 2 60  W Tryptophan  0 90  3 40 1 00 0 81 0 85 0 30 1 90  Y Tyrosine  1 30  2 30 3 20 0 26 0 76  0 40  0 70    Table 15 1  Hydrophobicity scales  This table shows seven different hydrophobicity scales which  are generally used for prediction of e g  transmembrane regions and antigenicity     work for commercial purposes  You may not alter  transform  nor build upon this work     SOME RIGHTS RESERVED    See http   creativecommons org licenses by nc nd 2 5  for more information on  how to use the contents     15 3 Reverse translation from protein into DNA    A protein sequence can be back translated into DNA using CLC DNA Workbench  Due to  degeneracy of the genetic code every amino
175. C DNA Workbench  but are opened by other applications  e g  pdf files  Microsoft  Word files  Open Office spreadsheet files  or links to programs and web pages etc     This chapter first deals with importing and exporting data in bioinformatic data formats and as  external files  Next comes an explanation of how to export graph data points to a file  and how  export graphics     7 1 Bioinformatic data formats    The different bioinformatic data formats are imported in the same way  therefore  the following  description of data import is an example which illustrates the general steps to be followed   regardless of which format you are handling     117    CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 118    1 1 1 Import of bioinformatic data    CLC DNA Workbench has support for a wide range of bioinformatic data such as sequences   alignments etc  See a full list of the data formats in section G 1     The CLC DNA Workbench offers a lot of possibilities to handle bioinformatic data  Read the next  sections to get information on how to import different file formats or to import data from a Vector  NTI database     Import using the import dialog  To start the import using the import dialog   click Import    amp   in the Toolbar    This will show a dialog similar to figure 7 1  depending on which platform you use   You can  change which kind of file tyoes that should be shown by selecting a file format in the Files of  type box         EM impor     CSE i EN    File name     Files o
176. CCCCATTGACGCAAATGGGCGGTAGGCGTGTAC    480 500 520           GGTGGGAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGG  Figure 13 23  Showing dynamic motifs on the sequence   This case shows the CMV promoter primer Sequence which is one of the pre defined motifs in    CLC DNA Workbench  The motif is per default shown as a faded arrow with no text  The direction  of the arrow indicates the strand of the motif     CHAPTER 13  GENERAL SEQUENCE ANALYSES 223    Placing the mouse cursor on the arrow will display additional information about the motif as  illustrated in figure 13 24     sCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGG  Imotif CGCAAATG GGCGGTAGGCGTG  list index  5  CMV      itype Simple  Idescriptior  CMV promoter primer       Figure 13 24  Showing dynamic motifs on the sequence     To add Labels to the motif  select the Flag or Stacked option  They will put the name of the motif  as a flag above the sequence  The stacked option will stack the labels when there is more than  one motif so that all labels are shown     Below the labels option there are two options for controlling the way the sequence should be  searched for motifs     e Include reverse motifs  This will also find motifs on the negative strand  only available for  nucleotide sequences     e Exclude matches in N regions for simple motifs  The motif search handles ambiguous  characters in the way that two residues are different if they do not have any residues in  common  For example  For nucleotides  N matches any 
177. CHAPTER 13  GENERAL SEQUENCE ANALYSES    216    Amino acid Mammalian Yeast E  coli   Ala  A  4 4 hour  gt 20 hours  gt 10 hours  Cys  C  1 2 hours  gt 20 hours  gt 10 hours  Asp  D  1 1 hours 3 min  gt 10 hours  Glu  E  1 hour 30 min  gt 10 hours  Phe  F  1 1 hours 3 min 2 min   Gly  G  30 hours  gt 20 hours  gt 10 hours  His  H  3 5 hours 10 min  gt 10 hours  lle  1  20 hours 30 min  gt 10 hours  Lys  K  1 3 hours 3 min 2 min   Leu  L  5 5 hours 3 min 2 min   Met  M  30 hours  gt 20 hours  gt 10 hours  Asn  N  1 4 hours 3 min  gt 10 hours  Pro  P   gt 20 hours  gt 20 hours f   Gin  Q  0 8 hour 10 min  gt 10 hours  Arg  R  1 hour 2 min 2 min   Ser  S  1 9 hours  gt 20 hours  gt 10 hours  Thr  T  7 2 hours  gt 20 hours  gt 10 hours  Val  V  100 hours  gt 20 hours  gt 10 hours  Trp  W  2 8 hours 3 min 2 min   Tyr  Y  2 8 hours 10 min 2 min    Table 13 2  Estimated half life  Half life of proteins where the N terminal residue is listed in the  first column and the half life in the subsequent columns for mammals  yeast and E  coli     X Ala   X Val   X lle  and X Leu  are the amino acid compositional fractions  The constants a and  b are the relative volume of valine  a 2 9  and leucine isoleucine  b 3 9  side chains compared  to the side chain of alanine  Ikai  1980      Estimated half life    The half life of a protein is the time it takes for the protein pool of that particular protein to be  reduced to the half  The half life of proteins is highly dependent on the presence of
178. Database    24   This opens the dialog seen in figure 12 12              E  E  Create BLAST Database us  1  Choose where to run E a in A rhs  Navigation   rea Selected Elements  6   2  Select sequences of same  J e CLC Data we 094296  3  Example Data Ss P39524    Cloning S  P57792  Primers As Q29449      protein Ss  Q9NTIZ  Protein analyses f  Q95x33   k3 Protein orthologs  ys  sw  N ES  ns xs  as  d       RNA secondary str     Sequencing data  fe ATPBal  c ATPBal genomic ser  Bo ATPBal mRNA  A F coli thirmina  E    Q     enter search term gt  AN             Previous      gt  Next   Finist   XX Cancel                           Figure 12 12  Add sequences for the BLAST database     Select sequences or sequence lists you wish to include in your database and click Next     In the next dialog  shown in figure 12 13  you provide the following information   e Name  The name of the BLAST database  This name will be used when running BLAST  searches and also as the base file name for the BLAST database files   e Description  You can add more details to describe the contents of the database     e Location  You can select the location to save the BLAST database files to  You can add  or change the locations in this list using the Manage BLAST Databases tool  see section  12 4     CHAPTER 12  BLAST SEARCH 188       g    E Create BLAST Database i xs J            1  Choose where to run    Set database    abe obs       2  Select sequences of same    3  Set database properties    Database  
179. Fields v   hemoglobin   B  All Fields      complete   B  Add search parameters  8 Start search   C  Append wildcard     to search words  Rows  50 Search results Filter   Accession Definition Modification Date        a  AM270166 Aspergillus niger contig An08c0110  complete genome  2007 03 24 al  AM711867  Clavibacter michiganensis subsp  michiganensis NCPPB      2007 05 18  AP008209 Oryza sativa  japonica cultivar group  genomic DNA  c     2007 05 19  BA000016 Clostridium perfringens str  13 DNA  complete genome  2007 05 19  BC029387  Homo sapiens hemoglobin  gamma G  mRNA  cDNA clon     2007 02 08  BC130457  Homo sapiens hemoglobin  gamma G  mRNA  cDNA clon    2007 01 04  BC130459  Homo sapiens hemoglobin  gamma G  mRNA  cDNA clon    2007 01 04  BC139602  Danio rerio hemoglobin beta embryonic 2  mRNA  cDNA    2007 04 18  BC142787 Danio rerio hemoglobin beta embryonic 1  mRNA  cDNA     2007 06 11  BX842577 Mycobacterium tuberculosis H37Rv complete genome       2006 11 14 v     Download and Open 4 Download and Save Total number of hits  245  Open at NCBI  a    Figure 2 11  NCBI search view     Click Start search      to commence the search in NCBI     2 4 1 Searching for matching objects    When the search is complete  the list of hits is shown  If the desired complete human hemoglobin  DNA sequence is found  the sequence can be viewed by double clicking it in the list of hits from    the search  If the desired sequence is not shown  you can click the    More    button below th
180. Figure 10 3 shows an artificial sequence with all the different kinds of regions   20 40    Gene Gene   Gene   Gena  CLCCECCLCE LCCLCCLCOL CCLCCLCCLO CLCOLCOLCOE LCCLCCLCCL CC    ED BO 100  Gene      Gene   Gene   LCCLCCLCCL CCLCCLCCLC CLCCLCCLCC LCCLCCLCCL CCLCCLCCLC CL   120 140   Gene I   Gene Gene   CCLCCELCCLC CLECLCCLCC LCCLCCLCCL CCLCCLCCLC CLCCLCCLCC LC   160 180 200         Lene      CLCCLCCLCC LCCLECLCE  L CCLCCLCCLC CLCULCCLCE LCCLCCLCCL cc    220 240 260  Genel Ganel  LECLCCLCECL CCLCCLCCLC CLECLCCLCC LECLCCLCECL CCLCCLCCLC CL  280 300         CCLCCLCCLC CCLCCLCCLC CCLCCLCCLC CeLecLeccLe CCLCCLCCLC CC    Figure 10 3  Region  1  A single residue  Region  2  A range of residues including both endpoints   Region  3  A range of residues starting somewhere before 30 and continuing up to and including  40  Region  4  A single residue somewhere between 50 and 60 inclusive  Region  5  A range of  residues beginning somewhere between 70 and 80 inclusive and ending at 90 inclusive  Region  6   A range of residues beginning somewhere between 100 and 110 inclusive and ending somewhere  between 120 and 130 inclusive  Region  7  A site between residues 140 and 141  Region  8   A site between two residues somewhere between 150 and 160 inclusive  Region  9  A region  that covers ranges from 170 to 180 inclusive and 190 to 200 inclusive  Region  10  A region on  negative strand that covers ranges from 210 to 220 inclusive  Region  11  A region on negative  strand that covers range
181. Finish     This will start the secondary peak calling  A detailed history entry will be added to the history  specifying all the changes made to the sequence     Chapter 18    Cloning and cutting    Contents  18 1 Molecular cloning         0 0  2 ee eee ee 308  18 1 1 Introduction to the cloning editor               a a sees 309  18 1 2 Ie   l  ning WOrk IOW  lt   s 6265e 8628S IEA E Es 310  18 1 3 Manual cloning   2c kn hae ace eee eee DELETED E 313  18 1 4 Insert restriction site     2    2 2 ee ee ee 318  18 2 Gatewaycloning      2    ee ee 318  18 2 1 Add AUB sites   wae we Re ee ew ee me wm Re ee 319  18 2 2 Create entry clones  BP            0 0 0 2 eee ee a 324  18 2 3 Create expression clones  LR          2    ee ee ee a 326  18 3 Restriction site analysis         0 0 ee eet 327  18 3 1 Dynamic restriction sites        0    0  ee ee ee ee a 327  18 3 2 Restriction site analysis from the Toolbox               2206  335  18 4 Gel electrophoresis         0 0 2 eee ee et ee a 340  18 4 1 Separate fragments of sequences on gel          2 5 502 2 eee 341  18 4 2 Separate sequences on gel    2    eee ee te ee 341  18 4 3 Gelview   sis eee Dede eR ee ee ew 341  18 5 Restriction enzyme lists         00 eee ee ee 343  18 5 1 Create enzyme list    6 wwe we eee ee ee ew Aw we ew ee RA 343  18 5 2 View and modify enzyme list           0 0 0 ewww ee ee te 345    CLC DNA Workbench offers graphically advanced in silico cloning and design of vectors for various  purposes together with
182. Forward primer region  a Reverse primer region  or  both  These are defined by making a selection on the sequence and right clicking the selection   It is also possible to define a Region to amplify in which case a forward  and a reverse primer  region are automatically placed so as to ensure that the designated region will be included in the  PCR fragment  If areas are known where primers must not bind  e g  repeat rich areas   one or  more No primers here regions can be defined     If two regions are defined  it is required that at least a part of the Forward primer region is located  upstream of the Reverse primer region     After exploring the available primers  see section 16 3  and setting the desired parameter values  in the Primer Parameters preference group  the Calculate button will activate the primer design  algorithm     When a single primer region is defined  If only a single region is defined  only single primers will be suggested by the algorithm     After pressing the Calculate button a dialog will appear  see figure 16 7        Calculation parameters    Chosen parameters  Maximum primer length  Minimum primer length  Maximum G C content  Minimum GIC content  Maximum melting temperature  Minimum melting temperature  Maximum self annealing  Maximum self end annealing  Maximum secondary structure  3 end must meet G C requirements  5  end must meet G C requirements    Mispriming parameters  Use mispriming as exclusion criteria    Exact match  Minimum number of b
183. G EALGRLLV  P68231 MVHLSGDEKN AVHGLWSKV     KVDEVGG EALGRLLV  Q6H1U7 MVHLTAEEKN AITSLWGKV     AIEQTGG EALGRLLI  Pb8945  VHWTAEEKQ LITGLWGKV     NVADCGA EALARLLI  PF68873 MVHLTPEEKS AVTALWOKVX AAXNVDEVGG EALGRLLV  Consensus MVHLTAEEKN AVTALWGKV     NVDEVGG EALGRLLV  Sequence Logo MVHCTsEEKe AvTaLWGKV aveevG6G EALGRLLy    351    Conservation       Figure 19 5  The top figures shows the original alignment  In the bottom panel a single sequence  with four inserted X   s are aligned to the original alignment  This introduces gaps in all sequences  of the original alignment  All other positions in the original alignment are fixed     This feature is useful if you wish to add extra sequences to an existing alignment  in which case  you just select the alignment and the extra sequences and choose not to redo the alignment     It is also useful if you have created an alignment where the gaps are not placed correctly  In this  case  you can realign the alignment with different gap cost parameters     19 1 4 Fixpoints    With fixpoints  you can get full control over the alignment algorithm  The fixpoints are points on  the sequences that are forced to align to each other     Fixpoints are added to sequences or alignments before clicking  Create alignment   To add a  fixpoint  open the sequence or alignment and     Select the region you want to use as a fixpoint   right click the selection   Set  alignment fixpoint here    This will add an annotation labeled  Fixpoint  to the sequence  s
184. H2BD P  maniculatus  deer mouse  beta 2 globin  Hbb b2  DNA  3  region  194        PERH3BC P  maniculatus  deer mouse  beta 3 globin  Hbb b3  DNA  3  region   196  sequence list 0    Ty          EIEEE EE                      Figure 7 16  Selected elements in a Folder Content view     When the elements are selected  do the following to copy the selected elements   right click one of the selected elements   Edit   Copy   5     Then   right click in the cell AZ   Paste    4     The outcome might appear unorganized  but with a few operations the structure of the view in  CLC DNA Workbench can be produced   Except the icons which are replaced by file references in  Excel      Note that all tables can also be Exported  E  directly in Excel format     Chapter 8    History log    Contents  8 1 Element history 2 4665 c   ee ee ee ss dade ds E Ow Pe He EO 131  8 1 1 Sharing data with history             0 2 00 2 eee eee ee es 132    CLC DNA Workbench keeps a log of all operations you make in the program  If e g  you rename a  sequence  align sequences  create a phylogenetic tree or translate a sequence  you can always  go back and check what you have done  In this way  you are able to document and reproduce  previous operations     This can be useful in several situations  It can be used for documentation purposes  where  you can specify exactly how your data has been created and modified  It can also be useful if  you return to a project after some time and want to refresh your memory
185. II 5    gate N4 methyl          Fr Xhol 5   tega N6 meth    k  EcoRI 5  aatt N6 methyl    tetetetote i Xbal 5   ctag N6 meth              EcoRV Blunt   N6 methyl          7 BamHI 5    gatc N4 meth           F    HindIII 5   agct N6 methyl          Sall 5    tcga N6 meth    et  PstI 3    tgca N6 methyl     ettetee    Ball S  gatc  N   meth    esc  Sall 5   tega N6 methyl          HindIII 5    agct N6 meth            Smal Blunt   N4 methyl    teta  __  fEcorv Blunt   N6 meth    et  Xbal 5   ctag N6 methyl            EcoRI 5    aatt N6 meth           Xhol 5   tega N6 methyl          PstI 3    tgca N6 meth    k  Clal 5  cg N6 methyl         Ncol 5   catg N4 meth    e  Haelll Blunt   5 methylc         Sacl 3    agct 5 methyl          KpnI 3    gtac N6 methyl         Clal 5   cg N6 meth          Ncol 5   catg N4 methyl         HaellI Blunt   5 methyl          NdeI 5   ta N6 methyl         E NdeI 5 ta N6 meth         x  RiskT a MA mathul     Hk J lcabr FI sete           q   Previous     gt  Next   Finish   x Cancel            Figure 18 35  Choosing enzymes to be considered     At the top  you can choose to Use existing enzyme list  Clicking this option lets you select an  enzyme list which is stored in the Navigation Area  See section 18 5 for more about creating  and modifying enzyme lists     Below there are two panels     e To the left  you see all the enzymes that are in the list select above  If you have not chosen  to use an existing enzyme list  this panel shows all the enzym
186. In the example above  if  we want to group the reads according to sample ID and gene name  these two parts should be  checked as shown in figure 17 4                                a  BB Sort Sequences by Name ES       1  Select at least 2   et algorithm par   sequences of the same Specify settings  type  2  Set algorithm parameters    Simple Character   x  Positions Start 1 End  Java regular expression l  Press Shit   F1 for options   Preview   Sequence name AO2 Asp F 016 2007    Resulting group Asp016   Number of sequences 8 Number of bins 4   Use for grouping Name  A02   v  Asp  IF   g Wi 016  2007 01 10  LIS e  ETE                   Figure 17 4  Splitting up the name at every underscore  _  and using the sample ID and gene name  for grouping     At the middle of the dialog there is a preview panel listing     e Sequence name  This is the name of the first sequence that has been chosen  It is shown  here in the dialog in order to give you a sample of what the names in the list look like     e Resulting group  The name of the group that this sequence would belong to if you proceed  with the current settings     e Number of sequences  The number of sequences chosen in the first step     e Number of groups  The number of groups that would be produced when you proceed with  the current settings     This preview cannot be changed  It is shown to guide you when finding the appropriate settings     Click Next if you wish to adjust how to handle the results  see section 9 2   If n
187. License was successfully downloaded  The License is valid until  2008 08 01    If you experience any problems  please contact The CLC Support Team            Proxy Settings    vious   Next    Quit Workbench         Figure 1 15  A license has been downloaded     A progress for getting the license is shown  and when the license is downloaded  you will be able  to click Next     Go to license download web page    Selecting the second option  Go to license download web page  opens the license web page as  Shown in 1 16     Download a license       Figure 1 16  The license web page where you can download a license   Click the Request Evaluation License button  and you will be able to save the license on your  computer  e g  on the Desktop   Back in the Workbench window  you will now see the dialog shown in 1 17     Click the Choose License File button and browse to find the license file you saved before  e g   on your Desktop   When you have selected the file  click Next     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 24       License Wizard   58      d CLC DNA Workbench       Import a license from a file       Please click the button below and locate the file containing your license     No file selected          Choose License File       If you experience any problems  please contact The CLC Support Team            Proxy Settings     Previous   Next   Quit Workbench         Figure 1 17  Importing the license downloaded from the web site     Accepting the license agreement    Reg
188. MDINUC fsa  FASTA format  from our own Desktop  into the new    My folder      This file is chosen for demonstration purposes only   you may have  another file on your desktop  which you can use to follow this tutorial  You can import all kinds    of files     In order to import the HUMDINUC fsa file   Select    My folder      Import  E gt   in the Toolbar   navigate to HUMDINUC fsa on the  desktop   Select    The sequence is imported into the folder that was selected in the Navigation Area  before you  clicked Import  Double click the sequence in the Navigation Area to view it  The final result looks    like figure 2 2     CHAPTER 2  TUTORIALS 39    g CLC Dna Workbench 3 0  Current workspace  Default  Sele  File Edit Search View Toolbox Workspace Help    DEM EO Se DC ol  E Sa Ad ONDA O     Show New Import Expor Graphics Print Copy Workspace Search Fit Width 100  Pan SOCI Zoom In Zoom Out  der HUMDINUC E3       y Ma TAS    CLC_Data T  Sar My folder HUMDINUC ACAAATTGATTAATGATAGTGCTATC  ae   A HUMDINUC   Sequence layout    HR Recycle bin  1  7 Spacing    HUMDINUC CTCTTGCATTTAGAGTTTAACTGGTA No spacing    b    45                60     No wrap       HUMDINUC CCTACTTCCAAAAGGGAAACAGAATT      Autowrap    80 100    Fixed wrap   l      dia HUMDINUC AGAAAAGAAAATGTGGTTCCAGAAAG   10000      Alignments and Trees    KA General Sequence Analyses 120  _  Double stranded    KA Nucleotide Analyses    ak Protein Analyses HUMDINUC GAAGAAAAAGAACACACACACACACA  4  Numbers on sequences    TAA Sequenc
189. MENT 358    select the part of the sequence you want to delete   right click the selection   Edit  Selection       Delete the text in the dialog   Replace    The selection shown in the dialog will be replaced by the text you enter  If you delete the text   the selection will be replaced by an empty text  i e  deleted     To delete entire columns     select the part of the alignment you want to delete   right click the selection    Delete columns    The selection may cover one or more sequences  but the Delete columns function will always  apply to the entire alignment     19 3 4 Copy annotations to other sequences  Annotations on one sequence can be transferred to other sequences in the alignment   right click the annotation   Copy Annotation to other Sequences    This will display a dialog listing all the sequences in the alignment  Next to each Sequence is a  checkbox which is used for selecting which sequences  the annotation should be copied to  Click  Copy to copy the annotation     If you wish to copy all annotations on the sequence  click the Copy All Annotations to other  Sequences     19 3 5 Move sequences up and down  Sequences can be moved up and down in the alignment   drag the name of the sequence up or down    When you move the mouse pointer over the label  the pointer will turn into a vertical arrow  indicating that the sequence can be moved     The sequences can also be sorted automatically to let you save time moving the sequences  around  To sort the sequenc
190. Mask lower case    Expect  1 sa    Word size  3H    Matrix  BLOSUM62   w    Gap cost  Existence 11  Extension 1      O                      Max number of hit sequences j 250       ee nfen     previous    Buea    Jrinish    Xcancei    Figure 12 7  Examples of parameters that can be set before submitting a BLAST search        CHAPTER 12  BLAST SEARCH 180    See section 12 1 1 for information about these limitations     There is one setting available for local BLAST jobs that is not relevant for remote searches at the  NCBI     e Number of processors  You can specify the number of processors which should be used if  your Workbench is installed on a multi processor system     12 1 4 BLAST a partial sequence against a local database  You can search a database using only a part of a sequence directly from the sequence view     select the region that you wish to BLAST   right click the selection   BLAST  Selection Against Local Database        This will go directly to the dialog shown in figure 12 6 and the rest of the options are the same  as when performing a BLAST search with a full sequence     12 2 Output from BLAST searches    The output of a BLAST search is similar whether you have chosen to run your search locally or at  the NCBI  If a single query sequence was used  then the results will show the hits found in that  database with that single sequence  If more than one sequence was used to query a database   the default view of the results is a summary table  showing the des
191. Name Orthologs  Description    set of ortholog proteins    Location C  Users smoensted CLCdatabases w             EC Cem  Kera      Figure 12 13  Providing a name and description for the database  and the location to save the files  to                 Click Finish to create the BLAST database  Once the process is complete  the new database will  be available in the Manage BLAST Databases dialog  see section 12 4  and when running local  BLAST  see section 12 1 3      12 4 Manage BLAST databases    The BLAST database available as targets for running local BLAST searches  see section 12 1 3   can be managed through the Manage BLAST Databases dialog  see figure 12 14      Toolbox   BLAST        Manage BLAST Databases        e BLAST Database Manager       BLAST database locations        home joeuser CLCdatabases a   home joeuser blastdbs Add Location    Remove Location    Refresh Locations                   BLAST databases overview                                           Name Description Date Sequences   Type Total size Location       1000 residues   fungupdate fungnew  08 02 2011 17634 DNA 51683  home joeuser     pataa Protein sequences d     04 05 2011 974785 Protein 198433  home joeuser     vrlupdate vrinew4  08 02 2011 40754 DNA 46691  home joeuser               2 Close          Figure 12 14  Overview of available BLAST databases     At the top of the dialog  there is a list of the BLAST database locations  These locations are  folders where the Workbench will look for vali
192. OMPARISON OF WORKBENCHES AND THE VIEWER    Sequence alignment   Multiple sequence alignments  Two algo   rithms    Advanced re alignment and fix point align  ment options   Advanced alignment editing options   Join multiple alignments into one  Consensus sequence determination and  management   Conservation score along sequences  Sequence logo graphs along alignments  Gap fraction graphs   Copy annotations between sequences in  alignments   Pairwise comparison    RNA secondary structure   Advanced prediction of RNA secondary struc   ture   Integrated use of base pairing constraints  Graphical view and editing of secondary struc   ture   Info about energy contributions of structure  elements   Prediction of multiple sub optimal structures  Evaluate structure hypothesis   Structure scanning   Partition function    Dot plots  Dot plot based analyses    Phylogenetic trees  Neighbor joining and UPGMA phylogenies  Maximum likelihood phylogeny of nucleotides    Pattern discovery   Search for sequence match   Motif search for basic patterns   Motif search with regular expressions  Motif search with ProSite patterns  Pattern discovery    Viewer  E    Viewer    Viewer    Viewer  2    Viewer  i    Protein  E    Protein    Protein        Protein  E  u    Protein  E    DNA RNA Main    DNA RNA Main    DNA RNA Main    DNA RNA Main    DNA RNA Main    379    Genomics  E    Genomics  Ly    Genomics  E    Genomics     E    Genomics  E    APPENDIX A  COMPARISON OF WORKBENCHES AND THE VIEWER    P
193. Output from the contig   2c ccd atcteueevuabteaee dau a 300  Li  Extract parts Oracle  cee  s sase ccsa ds ee ew Se ew CE 301  17 7 7 Variance table      eR ee ne ee eae ea ea a ee 303  17 8 Reassemble contig        0 0 ee eee 304  17 9 Secondary peak calling      1    2 ee 305    CLC DNA Workbench lets you import  trim and assemble DNA sequence reads from automated  sequencing machines  A number of different formats are supported  see section 7 1 1   This  chapter first explains how to trim sequence reads  Next follows a description of how to assemble  reads into contigs both with and without a reference sequence  In the final section  the options  for viewing and editing contigs are explained     211    CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 2 8    17 1 Importing and viewing trace data    A number of different binary trace data formats can be imported into the program  including  Standard Chromatogram Format   SCF   ABI sequencer data files  ABI and  AB1   PHRED output  files   PHD  and PHRAP output files  ACE   see section 7 1 1      After import  the sequence reads and their trace data are saved as DNA sequences  This means  that all analyzes which apply to DNA sequences can be performed on the sequence reads   including e g  BLAST and open reading frame prediction     You can see additional information about the quality of the traces by holding the mouse cursor  on the imported sequence  This will display a tool tip as shown in figure 17 1    E gt  Assembly   
194. Panel showing the available choices of  information to display     55 921 660 00 GCGTGGATAGCGGTTTGA 56 978 GAGGCTGGTTGATGAAGA 56 439    Self annealing Fwd               x     Self annealing alignment Fwd          Clicking a primer pair in the table will make a corresponding selection on the sequence in the  view above  At this point  you can either settle on a specific primer pair or save the table for  later  If you want to use e g  the first primer pair for your experiment  right click this primer pair    in the table and save the primers     You can also mark the position of the primers on the sequence by selecting Mark primer  annotation on sequence in the right click menu  see figure 2 41      This tutorial has shown some of the many options of the primer design functionalities of CLC DNA  Workbench  You can read much more using the program   s Help function       or in the CLC DNA    CHAPTER 2  TUTORIALS 61       Ir  atp8al            800 1 000 1 200 1 400    Primer Designer settings                 t x g    ill              Primer parameters    Length    T7 Promoter Reverse primer region       orward primer region Atp8al Atp8a1     Max  2215    7 Min  185    Melt  temp    C   Max  585  Min   48 4    Inner Melt  temp    C   Max  625    nm             pcDNA3 atp8a1  lt     Min 525    gt  Advanced parameters    dl Mode    2 O BEEYAN    Em pcDNA3 atp8al        Primer Table Settings SS  a      T  e  E  E                             Rows  100 Standard primers for  pcDNA3 atp
195. Previous    pues    J Enisn    aX cancer   Figure 12 5  Choose one or more sequences to conduct a BLAST search      lt enter search term gt  IR             Local BLAST             1  Select sequences of same a  ES STE        type    2  Choose program and target    BLAST program       Program   blastp  Protein sequence and database br       Target       Sequences      BLAST database    uniprotfun  Protein    uniprotfun                              es m  Figure 12 6  Choose a BLAST program and a target database        e Sequences  When you choose this option  you can use sequence data from the Navigation  Area as database by clicking the Browse and select icon  55   A temporary BLAST  database will be created from these sequences and used for the BLAST search  It is  deleted afterwards  If you want to be able to click in the BLAST result to retrieve the hit  sequences from the BLAST database at a later point  you should not use this option  create  a create a BLAST database first  see section 12 3 3     e BLAST Database  Select a database already available in one of your designated BLAST  database folders  Read more in section 12 4     When a database or a set of sequences has been selected  click Next     This opens the dialog seen in figure 12 7     e Local BLAST             1  Select sequences of same      Set pe d  type  2  Choose program and target    3  Set BLAST parameters    Choose parameters  Number of threads   1H    Filter low Complexity    O R       Choose filter   
196. QQMECPHE PNERRHRROA WKTEPERRSQ STKES UMHEK  P20810 E PNERRHRROA WETEPERESO STRESMMHER  P27321 MSTTCABA WRNESER soO ssErrPmEHER                      P08855 1MNPABABAMP MSREVECPHP HSEERHAROR AKTEPER SO STMPPMBHER   P12675   MNPTETRADP MsKOBECPHS PNEERHEKOA WMERTEPERESO STEPSENHER   P20811 J                          METEPEKKPO ssKPSEMNHER   Q95208 1MNPTBABAMP CSMOBBCPHS PNERRHEKOsA WMETEPERESsO sTEPSENHER  20 40   P49342  1MNPTETRAMP WS QQMECPHE PNEREHEKROA HERTE    EKKSQ STRES UMHEK       P20810 1MNPTETRAMP MsooMEcPHE PNEREHEROL  P27321 1MBMBcCABAB                         P08855 1MNPABABAMP BsREMEcrPHP HSEREHEROS  P12675    MNPTETRADP MsKOBECPHS PNEERHEROL  P20811 1 MPMBABAB                         Q95208 1MNPTBABAMP CSHROBBCPHS PNERBHRRQ4 MRTEPERRSQ sTEPSENHER    Figure 19 3  The first 50 positions of two different alignments of seven calpastatin sequences  The  top alignment is made with cheap end gaps  while the bottom alignment is made with end gaps  having the same price as any other gaps  In this case it seems that the latter scoring scheme gives  the best result     STKES WHER  ssEPPMIEHER  STH    P MBHER          NM_173881_CD5 1    NM 000559 SS       NM 1 3881 CDS 1    NM 000559 1       Figure 19 4  The alignment of the coding sequence of bovine myoglobin with the full mRNA of  human gamma globin  The top alignment is made with free end gaps  while the bottom alignment  is made with end gaps treated as any other  The yellow annotation is the coding sequence in both  seq
197. Qy zenter search term gt  A    Figure 14 5  Choosing sequences for translation                 If a sequence was selected before choosing the Toolbox action  the sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     Clicking Next generates the dialog seen in figure 14 6        Cc  g Translate to Protein  88     1  Select nucleotide ME amerre    sequences       2  Set parameters  Translation of whole sequence     J  Reading frame  1     Reading frame  2     Reading frame  3     Reading frame  1     Reading Frame  2       Reading frame  3    Translation of coding regions   V  Translate CDS    Translate ORF    Genetic code translation table     ad   l    ZA Cere ee  NF Des     Figure 14 6  Choosing  1 and  3 reading frames  and the standard translation table                       Here you have the following options     Reading frames If you wish to translate the whole sequence  you must specify the reading frame  for the translation  If you select e g  two reading frames  two protein sequences are  generated     Translate coding regions You can choose to translate regions marked by and CDS or ORF  annotation  This will generate a protein Sequence for each CDS or ORF annotation on the    sequence     Genetic code translation table Lets you specify the genetic code for the translation  The  translation tables are occasionally updated from NCBI  The tables are not available 
198. Reverse zoom function Shift Shift Click in view   Select multiple elements Ctrl ab Click elements   Select multiple elements Shift Shift Click elements    ements  in this context refers to elements and folders in the Navigation Area selections on  sequences  and rows in tables     Chapter 4    Searching your data    Contents  4 1 What kind of information can be searched           080 082 ee eae 98  4 2 Quick search    ie eee ee a eee eRe ee E eH 99  fie Quick search reSunS   moe baw be Aes eo ed bw ewe a 99  4 2 2 Special search expressions       0    a ee ee ee ee 100  4 2 3 Quicksearch history        2    wee ee ee ee 101  4 3 Advanced search         00s eee eee 101  44 Seatch index  is iraa wet HK ea SSK REEDS RO ERS HEH 103    There are two ways of doing text based searches of your data  as described in this chapter     e Quick search directly from the search field in the Navigation Area     e Advanced search which makes it easy to make more specific searches     In most cases  quick search will find what you need  but if you need to be more specific in your  search criteria  the advanced search is preferable     4 1 What kind of information can be searched     Below is a list of the different kinds of information that you can search for  applies to both  quick search and the advanced search      e Name  The name of a sequence  an alignment or any other kind of element  The name is  what is displayed in the Navigation Area per default     Length  The length of the sequenc
199. SEMBLY 291    Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   This will start the trimming process  Views of each trimmed sequence will be shown  and you can  inspect the result by looking at the  Trim  annotations  they are colored red as default   If there  are no trim annotations  the sequence has not been trimmed     17 4 Assemble sequences    This section describes how to assemble a number of sequence reads into a contig without the  use of a reference sequence  a known sequence that can be used for comparison with the other  sequences  see section 17 5   To perform the assembly     select sequences to assemble   Toolbox in the Menu Bar   Sequencing Data  Analyses  157    Assemble Sequences  F7     This opens a dialog where you can alter your choice of sequences which you want to assemble     You can also add sequence lists     Note  You can assemble a maximum of 2000 sequences at a time     To assemble more sequences  you need the CLC Genomics Workbench  see http   www     clcbio com genomics      When the sequences are selected  click Next  This will show the dialog in figure 17 16       r  q Assemble Sequences    nucleotide sequences ae  Trimming  2  Set assembly parameters    Alignment options    Conflicts    Output options             Vote  A  C  G  T        1  Select at least two SSD Pat ameter S    J  Trim sequence ends before assembly    Minimum aligned read length  50    Alignment stringency  Medium w    Unkno
200. Shift   F1 For options       Reverse insets       Press Shift   F1 for options    Preview    GGGG ACAAGTTTGTACAAAAAAGCAGGCTTA AGGAGGT  attB 1    Shine Dalgamo    Sequence of interest    LACCCAGCTTTCTTGTACAAAGTGGT CCCC    Figure 18 18  A Shine Dalgarno sequence has been inserted        Add attB Sites      Select nucleotide  sequences      Specify primer additions      Set primer parameters    Primer extensions    Forward primer extension   20    Reverse primer extension 20    Figure 18 19  Specifying the length of the template specific part of the primers        Besides the main output which is a copy of the the input sequence s  now including attB sites  and primer additions  you can get a list of primers as output  Click Next if you wish to adjust how  to handle the results  see section 9 2   If not  click Finish     The attB sites  the primer additions and the primer regions are annotated in the final result as  shown in figure 18 21     There will be one output sequence for each sequence you have selected for adding attB sites   Save  5  the resulting sequence as it will be the input to the next part of the Gateway cloning  work flow  see section 18 2 2   When you open the sequence again  you may need to switch    on the relevant annotation types to show the sites and primer additions as illustrated in figure  18 21     CHAPTER 18  CLONING AND CUTTING 323    Add attB Sites      Select nucleotide  sequences      Specify primer additions    Set primer parameters    Result ha
201. T search  to  meet your requirements        e BLAST at NCBI    1  Select sequences of same Miss  fcibliAch  d   type  2  Choose program    3  Set BLAST parame ters    Choose parame ters  Limit by entrez query     all organisms   x        _  Filter low Complexity  Choose filter  _   _  Mask lower case  Expect   10  Word size   3H  Matrix   BLOSUM  62              Gap cost   Existence 11  Extension 1 e   Max number of hit sequences  100          r   q      Previous  gt  Next X Cancel    Figure 12 4  Parameters that can be set before submitting a BLAST search     When choosing BLASTx or tBLASTx to conduct a search  you get the option of selecting a  translation table for the genetic code  The standard genetic code is set as default  This setting is  particularly useful when working with organisms or organelles that have a genetic code different  from the standard genetic code     The following description of BLAST search parameters is based on information from http     www ncbi nlm nih gov BLAST blastcgihelp shtml     e Limit by Entrez query BLAST searches can be limited to the results of an Entrez query against  the database chosen  This can be used to limit searches to subsets of entries in the BLAST  databases  Any terms can be entered that would normally be allowed in an Entrez search  session  More information about Entrez queries can be found at http   www ncbi   nlm nih gov books NBK3837 fEntrezHelp Entrez Searching Options  The  syntax described there is the same as woul
202. TAATGTGAGATGGITCCCAATATCATGIGA  POPU TEEPE eee  TGTTICTTGGIAGATTATTCATAATGTIGAGATGGTICCCAATATCATGIGA    171    1163    1112    Score    224 bits  113   Expect   6e 56  Identities   161 161  100    Gaps   0 161  0    Strand Flus Flus    Query    Sbjct    213    GACIGIGCAATACTTAGAGAACCIATAGCATICTIICICATICCCATGIGGAACAGGATGCC  PEP UTEP Eee  GACTGIGCAATACTTAGAGAACCTATAGCATCTICICATICCCATGIGGAACAGGATGCC    CACATACTGICTAATTAATAAATITICCACtrct ttt cCABACAAGTATGAATCTAGITGS  PPP U PEPE eee  CACATACTGICTAATTAATAAATTITCCATTTITITTITCAAACAAGTAIGAAICTIAGITGG 1324    Query 273  Sbjct  TIGATGCCttttttttCATGACATAATAAAGIAITITCIIT    PEPUTEEEP TEEPE eee  TIGATGCCTITITTTICATGACATAATAAAGTATIITCIIT    Query 373    Sbjct 1365    Figure 12 21  Alignment view of BLAST results  Individual alignments are represented together  with BLAST scores and more     12 5 7 I want to BLAST against my own sequence database  is this possible     It is possible to download the entire BLAST program package and use it on your own computer   institution computer cluster or similar  This is preferred if you want to search in proprietary  sequences or sequences unavailable in the public databases stored at NCBI  The downloadable  BLAST package can either be installed as a web based tool or as a command line tool  It is  available for a wide range of different operating systems     The BLAST package can be downloaded free of charge from the following location http     www ncbi nim nih gov BLAST download shtml    Pre formatted
203. TI FINQPOLTKFCNNHVS Window kem  am mem     jA tp8a1    ATP8al TAKYNVITFLPRFLYSOFRRAANSFFLEJALLOQIPDVSPTGRYTTLVPLEFICA  450 e   a q         Eq            SS       Atpeat        333    ATPSal VAAIKE 11 EDI KRAKADNAVNKKOTOVLRNGAWE   VHWEKVNVGDIVI IKGKEYI v  gt  Hopp Weods  mow    JEI 1E  Figure 15 6  The different ways of displaying the hydrophobicity scores  using the Kyte Doolittle  scale     The latter option offers you the same possibilities of amplifying the scores as applies for coloring  of letters  The different ways to display the scores when choosing    graphs    are displayed in  figure 15 6  Notice that you can choose the height of the graphs underneath the sequence     15 2 3 Bioinformatics explained  Protein hydrophobicity    Calculation of hydrophobicity is important to the identification of various protein features  This  can be membrane spanning regions  antigenic sites  exposed loops or buried residues  Usually   these calculations are shown as a plot along the protein sequence  making it easy to identify the  location of potential protein features     20 40          Q6H1U7 mvh  EBRERA aitsiwgkva ie ogealg FilivypPWtsS Effanfk             Figure 15 7  Plot of hydrophobicity along the amino acid sequence  Hydrophobic regions on  the sequence have higher numbers according to the graph below the sequence  furthermore  hydrophobic regions are colored on the sequence  Red indicates regions with high hydrophobicity  and blue indicates regions with low hydrophobicity 
204. TTCAATTCCGTTCAATGATTCCATTEGATTC  98 1139 847 1 GACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC     2 90 40  189 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC  86 627 1969 1 GACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC  2 85 523 514 2 GACGATTCCATGCAATTCCGTTCAATGATTCCATTAGATTC  4 1256 1139 1 GACCATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC  78  1008 834 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC  64 294  1084 2 GACGATTCCATTCABTTCCGTTCAATGATTCCATTHEGATTC    8 722  1303 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC    Figure 7 14  A graph displayed along the mapped reads  Right click the graph to export the data  points to a file     will be shown  If the graph is covering a set of aligned sequences with a main sequence  such  as read mappings and BLAST results  the dialog shown in figure 7 15 will be displayed  These  kinds of graphs are located under Alignment info in the Side Panel  In all other cases  a normal  file dialog will be shown letting you specify name and location for the file     g Export Graphics    1  Output options    Export options        Export excluding gaps       Figure 7 15  Choosing to include data points with gaps    In this dialog  select whether you wish to include positions where the main sequence  the  reference sequence for read mappings and the query sequence for BLAST results  has gaps   If you are exporting e g  coverage information from a read mapping  you would probably want  to exclude gaps  if you want the positions in the exported file to match the reference
205. TTCTGGGCTTACCTTCCTATCAC     Standard PCR  Lgt  18     TagMan  Lgt  19 O Nested PCR  O  Sequencing    Lat  20     O EEn Eny a o  Figure 16 1  The initial view of the sequence used for primer design     wt    16 1 1 General concept    The concept of the primer view is that the user first chooses the desired reaction type for the  session in the Primer Parameters preference group  e g  Standard PCR  Reflecting the choice of  reaction type  it is now possibly to select one or more regions on the sequence and to use the  right click mouse menu to designate these as primer or probe regions  see figure 16 2      CHAPTER 16  PRIMERS 250       GGAUGGAAGIIFGAGINAReeFIIAaE         Forward primer region here        do Reverse primer region here     gt     a     E Region to amplify   Pl   E  No primers here       rai Copy Selection   Open Selection in Mew view    Edit Selection   Er  Delete Selection   Si  Add Annotation    EA Ha 4dd Enzymes Cutting Selection to Panel E  EE Insert Restriction Site After Selection Ss  CCAAGI Insert Restriction Site Before Selection AAL    Base Pair Constraint k  Set Numbers Relative to This Selection     Blast Selection Against NCBI   E Blast Selection Against Local Database    Figure 16 2  Right click menu allowing you to specify regions for the primer design    When a region is chosen  graphical information about the properties of all possible primers in  this region will appear in lines beneath it  By default  information is showed using a compact  mode
206. The created object can also be saved and exported as a text file     See figure 16 23    CHAPTER 16  PRIMERS        Primer order    o    Number of primers  4  Name  Primer Fl  24  44   GTTTCCOTTCCTCTAGTTTCT    Name  Primer Rl  l  s  141   CTCTTGTCAGCACTCCAT    Name  Primer Rl  l26  146   CCASACTCTTGTCAGCAC    Name  Primer Fl  19  37   CCATGGTTTCCTTCCTCT    Figure 16 23  A primer order for 4 primers     2 6    Chapter 17    Sequencing data analyses and Assembly    Contents  17 1 Importing and viewing trace data         0 0 0s eee ee es 278  17 41 14 SOMEM rsss sic E ee Ed ee es 218  17 1 2 Trace settings inthe Side Panel             0 0 0 50552 ses 278  17 2 Multiplexing     2    2 ee 279  17 2 1 Sort sequences by name          2 00 ee ee eee ee ee ee 279  17 2 2 Process tagged sequences       1    ee ee ee 283  17 3 Trim sequences     asasan ae eae CRE he ee eee ee we we 288  17 3 1 Manual trimming ss ek ek eee wR ek eed we eG 288  17 3 2 Automatic TIMNMINE assassinas a eee ke ee ee ee wo 289  17 4 Assemble sequences        0  2 2 ee ee nnnnnnnnnnnn 291  17 5 Assemble to reference sequence          0088 eee eee en eee 293  17 6 Add sequences to an existing contig        2 02 eee ee ee 295  17 7 View and edit contigs       1    eee 296  1    1 View settings inthe Side Panel               2 05  2 58000  297  Lives OVINE TNECONUS sossarna PRR GRRE Re Rew SRA SE 299  Listes DOMINO surprise Ds cde weet gee ba ee do 300  Lera KRCa CONTOS ii sawa Rae we ee ee eh a ee oe 300  17 7 5 
207. Utils wprintgc cgi mode c    Codon usage database   http   www kazusa or jp codon     Wikipedia on the genetic code  http   en wikipedia org wiki Genetic_code    Creative Commons License    All CLC bio   s scientific articles are licensed under a Creative Commons Attribution NonCommercial   NoDerivs 2 5 License  You are free to copy  distribute  display  and use the work for educational  purposes  under the following conditions  You must attribute the work in its original form and   CLC bio  has to be clearly labeled as author and provider of the work  You may not use this  work for commercial purposes  You may not alter  transform  nor build upon this work     SOME RIGHTS RESERVED    See http   creativecommons org licenses by nc nd 2 5  for more information on  how to use the contents     Chapter 16    Primers    Contents  16 1 Primer design   an introduction         0 0 ee eee et ee 249  16 1 1 Generalconcept        oa a 2 eee ee ee a 249  16 1 2 Scoring primers     e  e sa eso are Re REESE REE    251  16 2 Setting parameters for primers and probes         8 00888 see eee 251  16 2 1 Primer Parameters     kaw ack we ae we a eee a be ew oe ew eo 252  16 3 Graphical display of primer information          0 0088 2 ee eee 254  16 3 1 Compact  information mode     lt   42 5 ed Se bbw ee eK GS a 254  16 3 2 Detailed information mode             0 00 wee een nnn ene 254  16 4 Output from primer design       0 0 0 ee ee ee nnnnnnnne 255  164 1 Saving DNIMEIS s    ss prasadi isst ho
208. V model   Yang  1994a  models  All  models are time reversible  The JC and K80 models assume equal base frequencies and  the HKY and GTR models allow the frequencies of the four bases to differ  they will be  estimated by the observed frequencies of the bases in the alignment   In the JC model all  substitutions are assumed to occur at equal rates  in the K8O and HKY models transition  and transversion rates are allowed to differ  The GIR model is the general time reversible  model and allows all substitutions to occur at different rates  In case of the K8O and HKY  models the user may set a transtion transversion ratio value which will be used as starting  value or fixed  depending on the level of estimation chosen by the user  See below   For the  substitution rate matrices describing the substitution models we use the parametrization  of Yang  Yang  1994al      e Rate variation  in CLC DNA Workbench substitution rates may be allowed to differ among  the individual nucleotide sites in the alignment by selecting the include rate variation box   When selected  the discrete gamma model of Yang  Yang  1994b  is used to model rate  variation among sites  The number of categories used in the dicretization of the gamma  distribution as well as the gamma distribution parameter may be adjusted by the user  as  the gamma distribution is restricted to have mean 1  there is only one parameter in the  distribution     e Estimation estimation is done according to the maximum likelihood p
209. a M13mpB pUCS Tue Jun 30    smoensted Mismp   pUce     7229 Linear    pose Mismpo pUco Tue Jun 30    smoensted Mismpolpuca     7599 Linear   Show column  Por p  lio  Tue Jun 30    smoensted Cloning vector    3941 Linear Type  me pala Tue Jun 30    smoensted Cloning vector    4245 Circular En  pasa p  Ma4 Tue Jun 30     smoensted Cloning vector    6000 Linear  oe p  Tiss Tue Jun 30    smoensted p  T153 cloning    3658 Circular Modified  ae p  THi Tue Jun 30    smoensted Expression vec    3774 Linear Modified by  pasa p  THIO Tue Jun 30    smoensted Cloning vector    3771 Circular  NO p  THII Tue Jun 30    smoensted Cloning vector    a7 72 Linear Description  ae p  TH   Tue Jun 30     smoensted Cloning vector    3 53 Linear Length  poa p  THS Tue Jun 30    smoensted Cloning vector    3763 Circular  me pELCAT  Tue Jun 30    smoensted Plasmid pELCA    4496 Linear  _  Latin Name  me pELTATS Tue Jun 30    smoensted Plasmid pELCA    4344 Linear  C  Taxonomy  mae pELCATS Tue Jun 30    smoensted Cloning vector    4404 Linear  HO pELCATE Tue Jun 30    smoensted Cloning vector    4256 Linear    Common Name  HO pBR 322 Tue Jun 30     smoensted Cloning vector    4361 Circular Linear  eo pBR 325 Tue Jun 30    smoensted pBR325 cloning    5996 Circular    Select All    Move to Recycle Bin Deselect All       Figure 3 6  Viewing the elements in a folder     Sorting the elements in a view does not affect the ordering of the elements in the Navigation  Area     Note  The view only displays one  
210. able for all views that can be zoomed in and out  In figure 7 8 is a view of  a circular sequence which is zoomed in so that you can only see a part of it        AY738515       A    HBD HBB         lt  JE    Figure 7 8  A circular sequence as it looks on the screen     When selecting Export visible area  the exported file will only contain the part of the sequence  that is visible in the view  The result from exporting the view from figure 7 8 and choosing Export  visible area can be seen in figure 7 9        Figure 7 9  The exported graphics file when selecting Export visible area     On the other hand  if you select Export whole view  you will get a result that looks like figure 7 10   This means that the graphics file will also include the part of the sequence which is not visible  when you have zoomed in     Click Next when you have chosen which part of the view to export     7 3 2 Save location and file formats  In this step  you can choose name and save location for the graphics file  see figure 7 11      CLC DNA Workbench supports the following file formats for graphics export     CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS    AY738615  180 bp       126    Figure 7 10  The exported graphics file when selecting Export whole view  The whole sequence is  shown  even though the view is zoomed in on a part of the sequence              q Export Graphics a   m  1  Output options m pe ME  2  Save in file Lookin    EE Desktop   BE BE  cio Ds  Recent Items  Desktop  Documents 
211. act our  Support function     E mail  support clcbio com    1 2 Download and installation    The CLC DNA Workbench is developed for Windows  Mac OS X and Linux  The software for either  platform can be downloaded from http    www clcbio com download     1 2 1 Program download  The program is available for download on http    www clcbio com download   Before you download the program you are asked to fill in the Download dialog     In the dialog you must choose     e Which operating system you use   e Whether you would like to receive information about future releases  Depending on your operating system and your Internet browser  you are taken through some  download options     When the download of the installer  an application which facilitates the installation of the  program  is complete  follow the platform specific instructions below to complete the installation  procedure  t    1 2 2 Installation on Microsoft Windows  Starting the installation process is done in one of the following ways     t You must be connected to the Internet throughout the installation process     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 13    If you have downloaded an installer   Locate the downloaded installer and double click the icon   The default location for downloaded files is your desktop     If you are installing from a CD   Insert the CD into your CD ROM drive     Choose the  Install CLC DNA Workbench  from the menu displayed   Installing the program is done in the following steps     
212. aded and opened      e Open at NCBI  Opens the corresponding sequence s  at GenBank at NCBI  Here is stored  additional information regarding the selected sequence s   The default Internet browser is  used for this purpose     e Open structure  If the hit sequence contain structure information  the sequence is opened  in a text view or a 3D view  3D view in CLC Protein Workbench and CLC Main Workbench      You can do a text based search in the information in the BLAST table by using the filter at the  upper right part of the view  In this way you can search for e g  species or other information which  is typically included in the  Description  field     The table is integrated with the graphical view described in section 12 2 3 so that selecting a hit  in the table will make a selection on the corresponding sequence in the graphical view     12 3 Local BLAST databases    BLAST databases on your local system can be made available for searches via your CLC DNA  Workbench   section 12 3 1   To make adding databases even easier  you can download  pre formatted BLAST databases from the NCBI from within your CLC DNA Workbench   section  12 3 2   You can also easily create your own local blast databases from sequences within your  CLC DNA Workbench   section 12 3 3      CHAPTER 12  BLAST SEARCH 186    12 3 1 Make pre formatted BLAST databases available    To use databases that have been downloaded or created outside the Workbench  you can either    e Put the database files in one of
213. al parsing is also available     The default layout of the NCBI BLAST result is a graphical representation of the hits found  a  table of sequence identifiers of the hits together with scoring information  and alignments of the  query sequence and the hits     The graphical output  Shown in figure 12 19  gives a quick overview of the query Sequence and  the resulting hit sequences  The hits are colored according to the obtained alignment scores     The table view  shown in figure 12 20  provides more detailed information on each hit and  furthermore acts as a hyperlink to the corresponding sequence in GenBank     In the alignment view one can manually inspect the individual alignments generated by the BLAST  algorithm  This is particularly useful for detailed inspection of the sequence hit found sbjct  and  the corresponding alignment  In the alignment view  all Scores are described for each alignment     CHAPTER 12  BLAST SEARCH 195    Color key for alignment scores   lt 40 40 50 50 80 30 200   200       Query EE EE EE EE EE EE  FO 140 210 200 350    g                                            ee             a CU OU      a           ca eT  TE            E ooo  ooo  To    jo              a  E    Sau    Sse     Es  Sy ee O m          E    mu         i  as      a  Ee  a          RR OO         SSS  a  EEE  RR OO      a  ERES  o      ee  E E   a     e  EEE  EEE  EEE  EEE  aes  E  EE  eh  a   TE  a           EEE  A TE  EE  TT  E   s  TE    Figure 12 19  BLAST graphical view  A si
214. alignments    CLC DNA Workbench can join several alignments into one  This feature can for example be used  to construct  Supergenes  for phylogenetic inference by joining alignments of several disjoint  genes into one spliced alignment  Note  that when alignments are joined  all their annotations  are carried over to the new spliced alignment     Alignments can be joined by     CHAPTER 19  SEQUENCE ALIGNMENT 360    select alignments to join   Toolbox in the Menu Bar   Alignments and Trees  E     Join Alignments        or select alignments to join   right click either selected alignment   Toolbox   Align   ments and Trees   2    Join Alignments  Ez     This opens the dialog shown in figure 19 10          E  q Join Alignments    1  Select alignments of    Select align ents OF Sa pe  type    same                Projects  Selected Elements   2     CLC Data   PE alignment 2     Example Data     Cloning   55  Primers  4 7 Protein analyses    F Protein orthologs          iai i   L    alignment 1    4  RNA secondary str       Sequencing data          m    ES  EJ                Qr    lt enter search term gt  A             Figure 19 10  Selecting two alignments to be joined     If you have selected some alignments before choosing the Toolbox action  they are now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove alignments from  the selected elements  Click Next opens the dialog shown in figure 19 11        o  q Join Alignments  1  Select alignments
215. an also be saved by dragging it into the Navigation Area  It is possible to select  more sequences and drag all of them into the Navigation Area at the same time     CHAPTER 11  ONLINE DATABASE SEARCH 170    Download GenBank search results using right click menu    You may also select one or more sequences from the list and download using the right click menu   see figure 11 2   Choosing Download and Save lets you select a folder where the sequences  are saved when they are downloaded  Choosing Download and Open opens a new view for each  of the selected sequences   Definition     Ra File  Edit  View       Toolbox    Show    T    F F F F         Download and Open lc          HE  Download and Save  Open at NCBI KI    Figure 11 2  By right clicking a search result  it is possible to choose how to handle the relevant  sequence     Copy paste from GenBank search results    When using copy paste to bring the search results into the Navigation Area  the actual files are  downloaded from GenBank     To copy paste files into the Navigation Area     select one or more of the search results   Ctrl   C  36   C on Mac    select a folder  in the Navigation Area   Ctrl   V    Note  Search results are downloaded before they are saved  Downloading and saving several  files may take some time  However  since the process runs in the background  displayed in the  Status bar  it is possible to continue other tasks in the program  Like the search process  the  download process can be stopped  Thi
216. an select on or more entry clones  see how to create an entry  clone in section 18 2 2   If you wish to perform separate LR reactions with multiple entry clones   you should run the Create Expression Clone in batch mode  see section 9 1      When you have selected your entry clone s   click Next     This will display the dialog shown in figure 18 25     Create Expression Clones  LR   q Select Entry vectors  j YO Seer      2  Select Destination vector    Destination vector    x   pDESTI4      Tres    Figure 18 25  Selecting one or more destination vectors        Clicking the Browse  uy  button opens a dialog where you can select a destination vector  You  can download donor vectors from Invitrogen   s web site  http   tools invitrogen com   downloads Gateway S20vectors ma4 and import into the CLC DNA Workbench  Note that  the Workbench looks for the specific sequences of the attR sites in the sequences that you  select in this dialog  See how to change the definition of sites in appendix F   Note that the  CLC DNA Workbench only checks that valid attR sites are found   it does not check that they  correspond to the attL sites of the selected fragments at this step  If the right combination of  attL and attR sites is not found  no entry clones will be produced     When performing multi site gateway cloning  the CLC DNA Workbench will insert the fragments   contained in entry clones  by matching the sites that are compatible  If the sites have been  defined correctly  an express
217. and Probes   tag Cloning and Restriction Sites     BLAST Search      8h Database Search    EE    lE                 Processes   Toolbox  ee       Idle    1 element s  are selected    Figure 3 18  An empty Workspace     Workspace  E  in the Toolbar   Select the Workspace to activate    or Workspace in the Menu Bar   Select Workspace  E    choose which Workspace  to activate   OK    The name of the selected Workspace is shown after  CLC DNA Workbench  at the top left corner  of the main window  in figure 3 18 it says   default      3 5 3 Delete Workspace  Deleting a Workspace can be done in the following way     Workspace in the Menu Bar   Delete Workspace   choose which Workspace to  delete   OK    Note  Be careful to select the right Workspace when deleting  The delete action cannot be  undone   However  no data is lost  because a workspace is only a representation of data      It is not possible to delete the default workspace     3 6 List of shortcuts  The keyboard shortcuts in CLC DNA Workbench are listed below     CHAPTER 3  USER INTERFACE    Action   Adjust selection   Change between tabs     Close   Close all views   Copy   Cut   Delete   Exit   Export   Export graphics   Find Next Conflict   Find Previous Conflict  Help   Import   Maximize restore size of View  Move gaps in alignment  Navigate sequence views  New Folder   New Sequence   View   Paste   Print   Redo   Rename   Save   Search local data   Search within a sequence  Search NCBI   Search UniProt   Select All 
218. and exported in the same way as bioinformatics files  see sec   tion 7 1 1   Bioinformatics files not recognized by CLC DNA Workbench are also treated as  external files     1 3 Export graphics to files    CLC DNA Workbench supports export of graphics into a number of formats  This way  the visible  output of your work can easily be saved and used in presentations  reports etc  The Export  Graphics function  EI  is found in the Toolbar     CLC DNA Workbench uses a WYSIWYG principle for graphics export  What You See Is What You  Get  This means that you should use the options in the Side Panel to change how your data  e g   a sequence  looks in the program  When you export it  the graphics file will look exactly the same  way     It is not possible to export graphics of elements directly from the Navigation Area  They must  first be opened in a view in order to be exported  To export graphics of the contents of a view        select tab of View   Graphics         This will display the dialog shown in figure 7 7        g Export Graphics Es    reo  1  Output options   RBS saias sales    Export options  O Export visible area    Export whole area    Figure   7  Selecting to export whole view or to export only the visible area                  Pnet   Finish   X Cancel            CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 125    1 3 1 Which part of the view to export    In this dialog you can choose to     e Export visible area  or    e Export whole view    These options are avail
219. and stability  and is of major importance to the study of proteins  Knudsen and Miyamoto   2001   Knowledge of the underlying phylogeny is  however  paramount to comparative methods  of inference as the phylogeny describes the underlying correlation from shared history that exists  between data from different species     In molecular epidemiology of infectious diseases  phylogenetic inference is also an important  tool  The very fast substitution rate of microorganisms  especially the RNA viruses  means that  these show substantial genetic divergence over the time scale of months and years  Therefore   the phylogenetic relationship between the pathogens from individuals in an epidemic can be  resolved and contribute valuable epidemiological information about transmission chains and  epidemiologically significant events  Leitner and Albert  1999    Forsberg et al   2001      20 2 3 Reconstructing phylogenies from molecular data    Traditionally  phylogenies have been constructed from morphological data  but following the  growth of genetic information it has become common practice to construct phylogenies based on  molecular data  known as molecular phylogeny  The data is most commonly represented in the  form of DNA or protein sequences  but can also be in the form of e g  restriction fragment length  polymorphism  RFLP      Methods for constructing molecular phylogenies can be distance based or character based     Distance based methods   Two common algorithms  both based on
220. ar sequences    The most simple example of a dot plot is obtained by plotting two homologous sequences of  interest  If very similar or identical Sequences are plotted against each other a diagonal line will  occur     The dot plot in figure 13 7 shows two related sequences of the Influenza A virus nucleoproteins  infecting ducks and chickens  Accession numbers from the two sequences are  DQ232610  and DQ023146  Both sequences can be retrieved directly from http    www ncbi nim nih   gov gquery gquery fcgi        Figure 13 7  Dot plot of DQ232610 vs  DQ023146  Influenza A virus nucleoproteins  showing and  overall similarity    Repeated regions    Sequence repeats can also be identified using dot plots  A repeat region will typically show up as  lines parallel to the diagonal line     If the dot plot shows more than one diagonal in the same region of a sequence  the regions  depending to the other sequence are repeated  In figure 13 9 you can see a sequence with  repeats     CHAPTER 13  GENERAL SEQUENCE ANALYSES 206    Direct repeats   gt   PDD TO    ACDEFGHIACDEFGHIACDEFGHIACDEFGHI    Inverted repeats      dt l    ACDEFGHIIHGFEDCAACDEFGHIIHGFEDCA    Figure 13 8  Direct and inverted repeats shown on an amino acid sequence generated for  demonstration purposes        Figure 13 9  The dot plot of a sequence showing repeated elements  See also figure 13 8     Frame shifts  Frame shifts in a nucleotide sequence can occur due to insertions  deletions or mutations  Such  frame shif
221. arameters  Max percentage point difference in G C content  Max difference in melting temperatures within a primer pair  Max hydrogen bonds between pairs  Max hydrogen bonds between pair ends    Maximum length of amplicon    Xe  re        Figure 16 13  Calculation dialog shown when designing alignment based PCR primers     16 9 3 Alignment based TaqMan probe design    CLC DNA Workbench allows the user to design solutions for TaqMan quantitative PCR which  consist of four oligos  a general primer pair which will amplify all sequences in the alignment   a specific TaqMan probe which will match the group of included sequences but not match  the excluded sequences and a specific TaqMan probe which will match the group of excluded  sequences but not match the included sequences  As above  the selection boxes are used to  indicate the status of a sequence  if the box is checked the sequence belongs to the included  sequences  if not  it belongs to the excluded sequences  We use the terms included and excluded  here to be consistent with the section above although a probe solution is presented for both  groups  In TaqMan mode  primers are not allowed degeneracy or mismatches to any template  sequence in the alignment  variation is only allowed required in the TaqMan probes     Pushing the Calculate button will cause the dialog shown in figure 16 14 to appear     The top part of this dialog is identical to the Standard PCR dialog for designing primer pairs  described above     The cen
222. ardless of which option you chose above  you will now see the dialog shown in figure 1 18        License Wizard xa     d CLC DNA Workbench    License Agreement       Please read and accept the license agreement below to begin using you license        END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE        CLC Genomics Workbench 1 0  E    1 Recitals    1 1 This End User License Agreement   EULA   is a legal agreement between you  either an individual person  or a single legal entity  who will be referred to in this EULA as  You   and CLC bio A S  CVR no   28 30 50 87  for the software products that accompanies this EULA  including any associated media  printed materials and  electronic documentation  the  Software Product         I accept these terms    If you experience any problems  please contact The CLC Support Team    Figure 1 18  Read the license agreement carefully              Please read the License agreement carefully before clicking   accept these terms and Finish     1 4 5 Configure license server connection    If your organization has installed a license server  you can use a floating license  The license  server has a set of licenses that can be used on all computers on the network  If the server has  e g  10 licenses  it means that maximum 10 computers can use a license simultaneously  When  you have selected this option and click Next  you will see the dialog shown in figure 1 19     This dialog lets you specify how to connect to the license server     e Connect
223. arget vector  from HindIII cut at 978979 to XhoI cut at 10521053  74bp    TCGAGIC    CAG         Fragment  ATP8a1 mRNA  ATP8a1 fwd   ATP8a1 rev    3 472bp linear      Fragment  from start of sequence to HindIII cut at 6 7  6bp AGCTTAT    GAC    Fragment  from HindIII cut at 6 7 to XhoI cut at 346273463  3 456bp       Fragment  from XhoI cut at 346273463 to end of sequence  10bp    ae V Target vector defined     Fragments to insert  1    BED              ATA    CTGAGCT       v Sequence details   7  Show   gt  Sequence layout   gt  Annotation layout   gt  Annotation types   v Restriction sites   7  Show  Labels  Stacked w  Sorting  Aa  GF GI  b  V  Non cutters    v  V  Single cutters  HB m xa  CE    v  V  Double cutters  E m Bami O  DME O  BB a Eora     BB M Hinan  2      Bro          v  7  Multiple cutters  Hl sma 3      E m sar  3  O  BB Vicon 3      BB vest       Deselect All         Figure 2 31  Press and hold the Ctrl key while you click first the Hindlll site and next the Xhol site        A  g   Adapt overhangs    Fragment  ATP8a1 mRNA  ATP8         b p    AGCEAI  a GAG   3 446bp     CTGAGCT  1       Replace input sequences with result                         Figure 2 32  Showing the insertion point of the vector    side and the fragment in the middle  The fragment can be reverse complemented by clicking the  Reverse complement fragment  489  but this is not necessary in this case  Click Finish and your    new construct will be opened     When saving your work  there are 
224. arting  with the term  E g  searching for  brca   will find both brcai and brca2     e Search related words      If you don t know the exact spelling of a word  you can append a  question mark to the search term  E g   brac1   will find sequences with a brcal gene     CHAPTER 4  SEARCHING YOUR DATA 101    e Include both terms  AND   If you write two search terms  you can define if your results  have to match both search terms by combining them with AND  E g  search for  brcat1 AND  human  will find sequences where both terms are present     e Include either term  OR   If you write two search terms  you can define that your results  have to match either of the search terms by combining them with OR  E g  search for  brcat  OR brca2  will find sequences where either of the terms is present     e Name search  name    Search only the name of element     e Organism search  organism    For sequences  you can specify the organism to search  for  This will look in the  Latin name  field which is seen in the Sequence Info view  see  section 10 4      e Length search  length  START TO END    Search for sequences of a specific length  E g   search for sequences between 1000 and 2000 residues   length 1000 TO 2000      If you do not use this special syntax  you will automatically search for both name  description   organism  etc   and search terms will be combined as if you had put OR between them     4 2 3 Quick search history    You can access the 10 most recent searches by clicking th
225. arting at the position that you want to be the new starting  point   right click the selection   Move Starting Point to Selection Start    Note  This can only be done for sequence that have been marked as circular     10 3 Working with annotations    Annotations provide information about specific regions of a sequence  A typical example is the  annotation of a gene on a genomic DNA sequence     Annotations derive from different sources     e Sequences downloaded from databases like GenBank are annotated     e In some of the data formats that can be imported into CLC DNA Workbench  sequences can  have annotations  GenBank  EMBL and Swiss Prot format      e The result of a number of analyses in CLC DNA Workbench are annotations on the sequence   e g  finding open reading frames and restriction map analysis      e You can manually add annotations to a sequence  described in the section 10 3 2      Note  Annotations are included if you export the sequence in GenBank  Swiss Prot  EMBL or CLC  format  When exporting in other formats  annotations are not preserved in the exported file     10 3 1 Viewing annotations    Annotations can be viewed in a number of different ways     e As arrows or boxes in the sequence views         Linear and circular view of sequences  et     0        Alignments  HE      CHAPTER 10  VIEWING AND EDITING SEQUENCES 153    Graphical view of sequence lists         BLAST views  only the query sequence at the top can have annotations   8      Cloning editor  i
226. arts before the first sequenced residue and continues up to  and including residue 888         1   gt 888  The region starts at the first sequenced residue and continues beyond  residue 888          102 110   Indicates that the exact location is unknown  but that it is one of the  residues between residues 102 and 110  inclusive         123  124  Points to a site between residues 123 and 124         join 12  78 134  202   Regions 12 to 78 and 134 to 202 should be joined to form  one contiguous sequence         complement 34  126  Start at the residue complementary to 126 and finish at the  residue complementary to residue 34  the region is on the strand complementary to  the presented strand          complement join 2691  4571 4918  5163    Joins regions 2691 to 45 71 and 4918  to 5163  then complements the joined segments  the region is on the strand  complementary to the presented strand          join complement 4918  5163  complement 2691  4571    Complements regions  4918 to 5163 and 2691 to 4571  then joins the complemented segments  the  region is on the strand complementary to the presented strand      e Annotations  In this field  you can add more information about the annotation like comments  and links  Click the Add qualifier key button to enter information  Select a qualifier which  describes the kind of information you wish to add  If an appropriate qualifier is not present  in the list  you can type your own qualifier  The pre defined qualifiers are derived from
227. as shown in figure 17 21     Note  This is only possible when you can see the residues on the reads  This means that you  need to have zoomed in to 100  or more and chosen Compactness levels  Not compact    Low     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 297       Fw d2 PAT CCACGTCGGTACAGAACAGGCTGC    Trace data    or  Packed   Otherwise the handles for dragging are not available  this is done in order to make  the visual overview more simple      If reads have been reversed  this is indicated by red  Otherwise  the residues are colored green   The colors can be changed in the Side Panel as described in section 17 7 1    If you find out that the reversed reads should have been the forward reads and vice versa  you  can reverse complement the whole contig imagine flipping the whole contig      right click in the empty white area of the contig   Reverse Complement    17 7 1 View settings in the Side Panel    Apart from this the view resembles that of alignments  see section 19 2  but has some extra  preferences in the Side Panel     e Read layout  A new preference group located at the top of the Side Panel         CompactnessThe compactness is an overall setting that lets you control the level of  detail to be displayed on the sequencing reads  Please note that this setting affects  many of the other settings in the Side Panel and the general behavior of the view  as well  For example  if the compactness is set to Compact  you will not be able to  see quality scores o
228. ase pairs required For a match    Number of consecutive base pairs required in 3   end    Ket   2    Figure 16 7  Calculation dialog for PCR primers when only a single primer region has been defined        The top part of this dialog shows the parameter settings chosen in the Primer parameters  preference group which will be used by the design algorithm     The lower part contains a menu where the user can choose to include mispriming as a criteria  in the design process  If this option is selected the algorithm will search for competing binding  sites of the primer within the sequence     The adjustable parameters for the search are     CHAPTER 16  PRIMERS 258    e Exact match  Choose only to consider exact matches of the primer  i e  all positions must  base pair with the template for mispriming to occur     e Minimum number of base pairs required for a match  How many nucleotides of the primer  that must base pair to the sequence in order to cause mispriming     e Number of consecutive base pairs required in 3    end  How many consecutive 3    end base  pairs in the primer that MUST be present for mispriming to occur  This option is included  since 3    terminal base pairs are known to be essential for priming to occur     Note  Including a search for potential mispriming sites will prolong the search time substantially  if long sequences are used as template and if the minimum number of base pairs required for  a match is low  If the region to be amplified is part of a ve
229. ated as the fragment  first click one of the  cut sites you wish to use  Then press and hold the Ctrl key    on Mac  while you click the second  cut site  You can also right click the cut sites and use the Select This     Site to select a site     When this is done  the panel below will update to reflect the selections  see figure 18 4      In this example you can see that there are now three options listed in the panel below the view     CHAPTER 18  CLONING AND CUTTING 311          d   ATP8a1 rev      Labels    Sorting  A  TE I  ERR CGATAAAG GAGTATCG am  equence details GCTATTTC CTcATAGC b  7  Non cutters    v  V  Single cutters       O pcDNA4_TO  5 078bp circular vector  7  Hindin  2           Target vector  from XhoI cut at 105271053 to HindIII cut at 978979  5 004bp   lt  TTA  Target vector  from HindIII cut at 978   979 to XhoI cut at 105271053  74bp      Fragment  ATP8a1 mRNA  ATP8a1 fwd   ATP8a1 rev    3 472bp linear   Fragment  from start of sequence to HindIII cut at 6 7  6bp AGCTTAT    GAC E  V  Smal  3           Fragment  from HindIII cut at 6 7 to XhoI cut at 346273463  3 456bp ATA     CIGA     sali  3       Fragment  from XhoI cut at 346273463 to end of sequence  10bp E  7  Bali  3   5                          EAEE    Figure 18 4  Hindlll and Xhol cut sites selected to cut out fragment                 This is because there are now three options for selecting the fragment that should be used for  cloning  The fragment selected per default is the one that is in bet
230. ation of local BLAST database  PubMed lookup   Web based lookup of sequence data  Search for structures  at NCBI     Main    Main  E    Main    Main  E    3 7    Genomics  E    Genomics  E    Genomics    Genomics  Li    APPENDIX A  COMPARISON OF WORKBENCHES AND THE VIEWER    General sequence analyses   Linear sequence view   Circular sequence view   Text based sequence view   Editing sequences   Adding and editing sequence annotations  Advanced annotation table   Join multiple sequences into one  Sequence statistics   Shuffle sequence   Local complexity region analyses  Advanced protein statistics  Comprehensive protein characteristics repor    Nucleotide analyses   Basic gene finding   Reverse complement without loss of annota  tion   Restriction site analysis   Advanced interactive restriction site analysis  Translation of sequences from DNA to pro   teins   Interactive translations of Sequences anc  alignments   G C content analyses and graphs    Protein analyses   3D molecule view   Hydrophobicity analyses   Antigenicity analysis   Protein charge analysis   Reverse translation from protein to DNA  Proteolytic cleavage detection  Prediction of signal peptides  SignalP   Transmembrane helix prediction  TMHMM   Secondary protein structure prediction  PFAM domain search    Viewer  E  LI    Viewer  E  E    Viewer    Protein  E    Protein  o  EI     Protein  T    DNA RNA Main    DNA RNA Main    DNA RNA Main    378    Genomics  E    Genomics  E  E    Genomics  E    APPENDIX A  C
231. ations  This will add an annotation to the sequence when a motif is found  an  example is shown in figure 13 27     e Create table  This will create an overview table of all the motifs found for all the input  sequences        TTAGCTGTGGCTGCTATTASAGAGATAATAGAAGATATTAAACGA    Figure 13 27  Sequence view displaying the pattern found  The search string was    tataaa        13 7 3 Java regular expressions    A regular expressions is a string that describes or matches a set of strings  according to  certain syntax rules  They are usually used to give a concise description of a set  without    CHAPTER 13  GENERAL SEQUENCE ANALYSES 226    having to list all elements  The simplest form of a regular expression is a literal string  The  syntax used for the regular expressions is the Java regular expression syntax  see http     java sun com docs books tutorial essential regex index html   Below is  listed some of the most important syntax rules which are also shown in the help pop up when  you press Shift   F1      A Z  will match the characters A through Z  Range   You can also put single characters  between the brackets  The expression  AGT  matches the characters A  G or T      A D M P   will match the characters A through D and M through P  Union   You can also put  single characters between the brackets  The expression  AG M P   matches the characters  A  G and M through P      A M amp  amp  H P   will match the characters between A and M lying between H and P  Intersection    You 
232. ats for exporting graphics  All data displayed in a graphical format can be  exported using these formats  Data represented in lists and tables can only be exported in  pdf  format  see section 7 3 for further details      Format Suffix    Portable Network Graphics  png    JPEG jpg  Tagged Image File tif  PostScript ps  Encapsulated PostScript  eps    Portable Document Format  pdf    Scalable Vector Graphics SVE    Type   bitmap   bitmap   bitmap   vector graphics  vector graphics  vector graphics    vector graphics    Appendix H    IUPAC codes for amino acids     Single letter codes based on International Union of Pure and Applied Chemistry     The information is gathered from  http    www ebi ac uk 2can tutorials aa html    396    APPENDIX H     One letter  abbreviation    T    Tomo ODO VU Z D  gt     x NU  lt   lt  SA WCCO VU TZ Ss    IUPAC CODES FOR AMINO ACIDS    Three letter Description  abbreviation    Ala Alanine   Arg Arginine   Asn Asparagine   Asp Aspartic acid   Cys Cysteine   Gin Glutamine   Glu Glutamic acid   Gly Glycine   His Histidine   Xle Leucine or Isoleucineucine  Leu Leucine   ILe Isoleucine   Lys Lysine   Met Methionine   Phe Phenylalanine   Pro Proline   Pyl Pyrrolysine   Sec Selenocysteine   Ser Serine   Thr Threonine   Trp Tryptophan   Tyr Tyrosine   Val Valine   ASX Aspartic acid or Asparagine Asparagine  Glx Glutamic acid or Glutamine Glutamine    Xaa Any amino acid    397    Appendix      IUPAC codes for nucleotides     Single letter codes based o
233. automatically translate  the nucleotide sequence selected as database     As Target select NC 000011 that you downloaded  If you are used to BLAST  you will know that  you usually have to create a BLAST database before BLASTing  but the Workbench does this  on  the fly  when you just select one or more sequences     Click Next  leave the parameters at their default  click Next again  and then Finish     Inspect BLAST result    When the BLAST result appears make a split view so that both the table and graphical view is  visible  see figure 2 46   This is done by pressing Ctrl    on Mac  while clicking the table view   4  at the bottom of the view        In the table start out by showing two additional columns     Positive  and  Query start   These  should simply be checked in the Side Panel     Now  sort the BLAST table view by clicking the column header    Positive   Then  press and hold  the Ctrl button  46 on Mac  and click the header  Query start   Now you have sorted the table  first on   Positive hits and then the start position of the query sequence  Now you see that you  actually have three regions with a 100  positive hit but at different locations on the chromosome  sequence  see figure 2 46      Why did we find  on the protein level  three identical regions between our query protein sequence  and nucleotide database       The beta globin gene is known to have three exons and this is exactly what we find in the BLAST  search  Each translated exon will hit the corresp
234. b delimited text    Vector NTI archives  Vector NTI Database  Zip export   Zip import    Suffix   fsa  fasta  abt   abi    clc    cmo      CSV    CSV     str  strider  bsml    embl    geg   bk  gb  gp   gck   pro  seq   NXS  NeEXUS   phd   pir   any    SCf    SCf    sdn     SWp    txt    393    Import Export Description    X    X X XxX X     gt  lt     X K K X XK X X X X X X X X X X     ma4  pa4  0a4 X    Zip    Zip  gzip   tar    X    X    X    Simple format  name  amp  description  Including chromatograms  Including chromatograms    Rich format including all information    Annotations in csv format    One sequence per line  name  de   scription optional   sequence    Only nucleotide sequence  Rich information incl  annotations    Rich information incl  annotations    Including chromatograms  Simple format  name  amp  description  Only sequence  no name   Including chromatograms    Including chromatograms    Rich information  only proteins     Annotations in tab delimited text for   mat    Archives in rich format  Special import full database  Selected files in CLC format    Contained files folder structure    APPENDIX G  FORMATS FOR IMPORT AND EXPORT 394    G 1 2 Contig formats    File type Suffix Import Export Description   ACE ace X X No chromatogram or quality score  CLC cle X X Rich format including all information  Zip export Zip X Selected files in CLC format   Zip import zip  gzip   tar X Contained files folder structure    G 1 3 Alignment formats    File type Suffix
235. be inserted at the 5    end of the primer as shown in figure  figure 2 27        ATP  al fwd CGATAMAGCTTATGCCGACCATGCGGAGGA    Figure 2 27  Adding restriction sites to a primer     Perform the same process for the ATP8al rev primer  this time using Xhol instead  This time   you should also add a few bases at the 5    end as was done in figure18 14 when inserting the  Hindlll site     Note  The ATP8al rev primer is designed to match the negative strand  so the restriction site  Should be added at the 5    end of this Sequence as well  Insert Restriction Site before Selection      Save     the two primers and close the views and you are ready for next step     2 6 3 Simulate PCR to create the fragment    Now  we want to extract the PCR product from the template ATP8a1 mRNA sequence using the  two primers with restriction sites     Toolbox   Primers and Probes      1    Find Binding Sites and Create Fragments  72     y  button    Select the ATP8al mRNA sequence and click Next  In this dialog  use the Browse     to select the two primer sequences  Click Next and adjust the output options as shown in  figure 2 28         Click Finish and you will now see the fragment table displaying the PCR product   In the Side Panel you can choose to show information about melting temperature for the primers   Right click the fragment and select Open Fragment as shown in figure 2 29     This will create a new sequence representing the PCR product  Save     the sequence in the  Cloning folder an
236. bench allows you to search the NCBI GenBank database directly from the  program  giving you the opportunity to both open  view  analyze and save the search results  without using any other applications  To conduct a search in NCBI GenBank from CLC DNA  Workbench you must be connected to the Internet     This tutorial shows how to find a complete human hemoglobin DNA sequence in a situation where  you do not know the accession number of the sequence     To start the search     Search   Search for Sequences at NCBI  g       CHAPTER 2  TUTORIALS 44    This opens the search view  We are searching for a DNA sequence  hence     Nucleotide    Now we are going to adjust parameters for the search  By clicking Add search parameters you  activate an additional set of fields where you can enter search criteria  Each search criterion  consists of a drop down menu and a text field  In the drop down menu you choose which part of  the NCBI database to search  and in the text field you enter what to search for     Click Add search parameters until three search criteria are available   choose  Organism in the first drop down menu   write  human    in the adjoining text field    choose All Fields in the second drop down menu   write    hemoglobin    in the adjoining  text field   choose All Fields in the third drop down menu   write    complete    in the                            adjoining text field     NCBI search     Choose database     Nucleotide O  Protein  All Fields v   human       All 
237. bidopsis thaliana  Saccharomyces cerevisiae   Schizosaccharomyces pombe              score Mus musculus  Bos taurus  Homo sapiens    mo Mus musculus  Bos taurus  Homo sapiens  Saccharomyces cerevisiae  Schizosaccharomyces pombe  Arabidopsis thaliana  Arabidopsis thaliana    Figure 20 6  Algorithm choices for phylogenetic inference  The bottom shows a tree found by the  neighbor joining algorithm  while the top shows a tree found by the UPGMA algorithm  The latter  algorithm assumes that the evolution occurs at a constant rate in different lineages     Neighbor Joining  The neighbor joining algorithm   Saitou and Nei  1987   on the other hand   builds a tree where the evolutionary rates are free to differ in different lineages  i e   the tree does  not have a particular root  Some programs always draw trees with roots for practical reasons   but for neighbor joining trees  no particular biological hypothesis is postulated by the placement  of the root  The method works very much like UPGMA  The main difference is that instead of  using pairwise distance  this method subtracts the distance to all other nodes from the pairwise  distance  This is done to take care of situations where the two closest nodes are not neighbors  in the  real  tree  The neighbor join algorithm is generally considered to be fairly good and is  widely used  Algorithms that improves its cubic time performance exist  The improvement is only  significant for quite large datasets     Character based methods 
238. blems were encountered while trying to locate a valid license  Click on  each error for a more detailed description        License Server  192 168 1 200 port  6200 a      No license available at the moment    All licenses obtainable from the server are currently in use   If the problem persists  please contact your local license server administrator     Additional licenses can be purchased by contacting the CLC bio sales team  on sales clcbio com                   To import a new license or change your license server settings  please click the License  Assistant button     If you experience any problems  please contact The CLC Support Team    Figure 1 20  No more licenses available on the server              In this case  please contact your organization   s license server administrator  To purchase  additional licenses  contact sales clcbio com     You can also click the Limited Mode button  see section 1 4 6      lf your connection to the license server is lost  you will see a dialog as shown in figure 1 21     License Server Error     4    CLC Network Licensing    Unable to locate a license server       A license server could not be located on your network     Ifthe problem persists  please contact your local license server  administrator           Configure License Server    Figure 1 21  Unable to contact license server     In this case  you need to make sure that you have access to the license server  and that the  server is running  However  there may be situations wher
239. c  Genomic sequences from NCBI Reference Sequence Project   est  Database of GenBank   EMBL   DDBJ sequences from EST division     est human  Human subset of est     386    APPENDIX D  BLAST DATABASES 387    e est mouse  Mouse subset of est   e est others  Subset of est other than human or mouse     e gss  Genome Survey Sequence  includes single pass genomic data  exon trapped se   quences  and Alu PCR sequences     e htgs  Unfinished High Throughput Genomic Sequences  phases O  1 and 2  Finished   phase 3 HTG sequences are in nr     e pat  Nucleotides from the Patent division of GenBank     e pdb  Sequences derived from the 3 dimensional structure records from Protein Data Bank   They are NOT the coding sequences for the corresponding proteins found in the same PDB  record     e month  All new or revised GenBank EMBL DDBJ PDB sequences released in the last 30  days     e alu  Select Alu repeats from REPBASE  suitable for masking Alu repeats from query  sequences  See  Alu alert  by Claverie and Makalowski  Nature 3 1  752  1994      e dbsts  Database of Sequence Tag Site entries from the STS division of GenBank   EMBL    DDBJ     e chromosome  Complete genomes and complete chromosomes from the NCBI Reference  Sequence project  It overlaps with refseq_genomic     e wgs  Assemblies of Whole Genome Shotgun sequences     e env_nt  Sequences from environmental samples  such as uncultured bacterial samples  isolated from soil or marine samples  The largest single source is Sagarss
240. can also put single characters between the brackets  The expression  AM amp  amp  HGTDA    matches the characters A through M which is H  G  T  Dora       A M  will match any character except those between A and M  Excluding   You can also  put single characters between the brackets  The expression  AG  matches any character  except A and G      A Z amp  amp    M P   will match any character A through Z except those between M and P   Subtraction   You can also put single characters between the brackets  The expression   A P amp  amp   CG   matches any character between A and P except C and G     The symbol  matches any character     X n  will match a repetition of an element indicated by following that element with a  numerical value or a numerical range between the curly brackets  For example  ACG 2   matches the string ACGG and  ACG  2  matches ACGACG     X n m  will match a certain number of repetitions of an element indicated by following that  element with two numerical values between the curly brackets  The first number is a lower  limit on the number of repetitions and the second number is an upper limit on the number  of repetitions  For example  ACT 1 3  matches ACT  ACTT and ACTTT     X n   represents a repetition of an element at least n times  For example   AC  2   matches  all strings ACAC  ACACAC  ACACACAC         The symbol   restricts the search to the beginning of your sequence  For example  if you  search through a sequence with the regular expression    AC 
241. ccept the License agreement and click Next     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 14    e Choose where you would like to install the application and click Next   e Choose if CLC DNA Workbench should be used to open CLC files and click Next     e Choose whether you would like to create desktop icon for launching CLC DNA Workbench  and click Next     e Choose if you would like to associate  clc files to CLC DNA Workbench  If you check this  option  double clicking a file with a  clc  extension will open the CLC DNA Workbench     e Wait for the installation process to complete  choose whether you would like to launch CLC  DNA Workbench right away  and click Finish     When the installation is complete the program can be launched from your Applications folder  or  from the desktop shortcut you chose to create  If you like  you can drag the application icon to  the dock for easy access     1 2 4 Installation on Linux with an installer    Navigate to the directory containing the installer and execute it  This can be done by running a  command similar to       sh CLCDNAWorkbench 6 JRE sh    If you are installing from a CD the installers are located in the  linux  directory     Installing the program is done in the following steps     e On the welcome screen  click Next   e Read and accept the License agreement and click Next     e Choose where you would like to install the application and click Next   For a system wide installation you can choose for example  opt or  usr l
242. ce    Contents  Sb  Wavigauon Alea i  sc tee eee een tae ee we hee TS SS Rae E 11  Suit DGCI sarro EE E A E e we ee 18  3 1 2 Create new folders       0    2 eee ee ee ee ee ee 80  3 1 3 Sorting folders         iw RES aS Se a ee ee Boke we ew S amp S 80  3 1 4 Multiselecting elements              0 0 0 0  ee a eee 80  3 1 5 Moving and copying elements       nononono a 80  3 1 6 Changeelementnames           00  e a a nee ewes 82  3 1 7 Deleteelements          0 0 0 ee eee ee ee ee ee rar 83  3 1 8 Show folder elements in a table              2 00 2 ee eee 83  3 2 View Area      2  ee ee annn nnn nnen 84  Qu DNC aicese dr aa aaa Bean heb AS E E oD Ree a 85  3 2 2 Show element in another view           0 00 wee ee ee ewe 86  3 2 3 CloseviewS      2    e a ee ee ee ee 86  3 2 4 Save changes in a view         2  eoa eee ee 87  3 2 0  MO MEDO arseron aeaaaee we A 87  3 2 6 Arrange views in View Area    oaaao a e a 88  Gat Sde Panel s dk ow ee eh ee ee ORE Ea eh ae ed E 90  3 3 Zoom and selection in View Area         0 0 ee eee ee nnnann 91  Seas OU ce ee ia ae eee eae ew Cae eae oe ae na 91  Soe POU sa bene ede Ge OSS Sw Oe we we DGS E SU E O 91  Seo  PM Lisp ae een he Oe Boe ek DEM E OE 92  Pe COND anil ria ad ES E DS Le 92  e  MMC es eee a ee tis ESA LS TILES E RE 92  33 0 Meto  garras E aaa E E 92  3 3 7 Changing compactness        0   0  ce ee eee ee ee ee ws 92  3 4 Toolbox and Status Bar       2    2 ee 93  3 4 1 Processes       2  ce eee ee ee ee ee ee rara 93  Be TOON ce te
243. ce correction matrices  substitution matrices  take into account  the likeliness of one amino acid changing to another     e Window size A residue by residue comparison  window size   1  would undoubtedly result in  a very noisy background due to a lot of similarities between the two sequences of interest   For DNA sequences the background noise will be even more dominant as a match between  only four nucleotide is very likely to happen  Moreover  a residue by residue comparison   window size   1  can be very time consuming and computationally demanding  Increasing  the window size will make the dot plot more    smooth        Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     CHAPTER 13  GENERAL SEQUENCE ANALYSES 203       a  EB Create Dot Plot Es      p  1  Select one or two   set parameters  sequences of same type    2  Set parameters    Distance correction and window size    Score model  BLOSUM62 w    Window size  9                   Q        Previous      gt  Next  denis    XX Cancel    Figure 13 4  Setting the dot plot parameters                 13 2 2 View dot plots    A view of a dot plot can be seen in figure 13 5  You can select Zoom in  40  in the Toolbar and  click the dot plot to zoom in to see the details of particular areas     ATP8a1 vs 094296   1100 Ea fe  1000  900  800  700    600    ATP8al    500  400  300       200 5574     100       200 400 600 800 1000 1200  094296    Figure 13 5  A view is opened showi
244. cense Agreement       Please read and accept the license agreement below to begin using you license        END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE    CLC Genomics Workbench 1 0     1 Recitals    1 1 This End User License Agreement   EULA   is a legal agreement between you  either an individual person  or a single legal entity  who will be referred to in this EULA as  You   and CLC bio A S  CVR no   28 30 50 87  for the software products that accompanies this EULA  including any associated media  printed materials and  electronic documentation  the  Software Product            I accept these terms    If you experience any problems  please contact The CLC Support Team            Proxy Settings        Previous   Finish   Quit Workbench         Figure 1 6  Read the license agreement carefully     Please read the License agreement carefully before clicking I accept these terms and Finish     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 19    1 4 2 Download a license    When you purchase a license  you will get a license ID from CLC bio  Using this option  you will  get a license based on this ID  When you have clicked Next  you will see the dialog shown in 1 7   At the top  enter the ID  paste using Ctrl V or 3   V on Mac        License Wizard Bal    E CLC DNA Workbench          Download a license       Please copy pas bio r Lice a  ID i ta nd choo sql U WO mpera ad you   prio nse  pda n pro des request the license sei will che de the License Order I vailable For do i ad  
245. ch       Import a license from a file       Please click the button below and locate the file containing your license     No file selected          Choose License File       If you experience any problems  please contact The CLC Support Team            Proxy Settings     Previous   Next   Quit Workbench         Figure 1 10  Importing the license downloaded from the web site     Click the Choose License File button and browse to find the license file you saved before  e g   on your Desktop   When you have selected the file  click Next     Accepting the license agreement  Regardless of which option you chose above  you will now see the dialog shown in figure 1 11     Please read the License agreement carefully before clicking   accept these terms and Finish     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 21       License Wizard  88     d CLC DNA Workbench    License Agreement       Please read and accept the license agreement below to begin using you license        END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE    CLC Genomics Workbench 1 0     1 Recitals    1 1 This End User License Agreement   EULA   is a legal agreement between you  either an individual person  or a single legal entity  who will be referred to in this EULA as  You   and CLC bio A S  CVR no   28 30 50 87  for the software products that accompanies this EULA  including any associated media  printed matenals and  electronic documentation  the  Software Product         I accept these terms    If you experience
246. character and R matches A G  For  proteins  X matches any character and Z matches E Q  Genome sequence often have large  regions with unknown sequence  These regions are very often padded with N s  Ticking this  checkbox will not display hits found in N regions and if a one residue in a motif matches to  an N  it will be treated as a mismatch     The list of motifs shown in figure 13 22 is a pre defined list that is included with the CLC DNA  Workbench  You can define your own set of motifs to use instead  In order to do this  you first  need to create a Motif list  E   This will bring up the dialog shown in figure 13 25           7  2    q Manage motifs   s  1  Please choose motifs FELASA  Select motif lists    RE Example motifs          Motif name Motif Description Type Motif name Motif Description Type    N glycosyla    N  P   ST      N glycosyla    Prosite SP6  G ITJATTTA    SP6 promot    Java  Amidation site x G  RK   RK  Amidation si    Prosite    Protein kina     ST  x  RK  Protein kina    Prosite  Bacterial his     GSK  F x 2    Bacterial his    Prosite  attB1 ACAAGTTT    Gateway fo    Simple  attB2 A4CCCAGC    Gateway re    Simple  T  TAATACGA    T  promote    Java   Cy CGCAAATG    CMY promot    Simple  T3 GCAATTAA    T3 promote    Simple  pGEX 5  GGGCTGGC    pGEX 5  primer Simple  T7 terminator GCTAGTTA    T7 terminat    Simple  His tag  CAT CACH    Standard hi    Java                            C Kea    Figure 13 25  Managing the motifs to be shown        CHAPTER 1
247. choice of regions to clone     However  the cloning editor has a special layout with three distinct areas  in addition to the Side  Panel found in other sequence views as well      e At the top  there is a panel to switch between the sequences selected as input for the  cloning  You can also specify whether the sequence should be visualized as circular or as  a fragment  At the right hand side  there is a button to the status of the sequence currently  shown to vector     e In the middle  the selected sequence is shown  This is the central area for defining how  the cloning should be performed  This is explained in details below     e At the bottom  there is a panel where the selection of fragments and target vector is  performed  see elaboration below      CHAPTER 18  CLONING AND CUTTING 310      O   Cloning exper    Q     Sequence 1 of 2  pcDNAS TO  5 078bp circular vector  w      Show as Circular vector  pcDNAS TO Change to Current        ene rm  ee    e         A    Sequence details       J  Show                 CMV forward prim er  TATA box  gt  Sequence layout  Bgill  gt  Annotation layout    ea  gt  Annotation types  Hindlll    BamHI  v Restriction sites    EcoRI   J  Show  Pst  Labels  Stacked w    bla  Ampicilli  Sall         4000        pcDNA4_TO BGH pA ee  g  A LI  5 078bp BGH reverse primer E  EcoRV  gt     4  Non cutters    Xhol  Xbal    4  Single cutters    1 ori visao  SV40 prom oter and ori  Smal    X  Double cutters    Smal  ES  J  xhol  2        Sall    sv40
248. chromatogram traces  A green  C blue  G black  and T red       Foreground color  Sets the color of the letter       Background color  Sets the background color of the residues     Nucleotide info    These preferences only apply to nucleotide sequences     e Translation  Displays a translation into protein just below the nucleotide sequence   Depending on the zoom level  the amino acids are displayed with three letters or one letter         Frame  Determines where to start the translation     x ORF CDS  If the sequence is annotated  the translation will follow the CDS or ORF  annotations  If annotations overlap  only one translation will be shown  If only one  annotation is visible  the Workbench will attempt to use this annotation to mark  the start and stop for the translation  In cases where this is not possible  the first  annotation will be used  i e  the one closest to the 5    end of the sequence      CHAPTER 10  VIEWING AND EDITING SEQUENCES 145    X    Selection  This option will only take effect when you make a selection on the  sequence  The translation will start from the first nucleotide selected  Making a  new selection will automatically display the corresponding translation  Read more  about selecting in section 10 1 3     1 to  1  Select one of the six reading frames    All forward All reverse  Shows either all forward or all reverse reading frames   All  Select all reading frames at once  The translations will be displayed on top of  each other     x    e    
249. click the selection   Realign selection    This will open Step 2 in the  Create alignment  dialog  allowing you to set the parameters for the  realignment  See section 19 1      It is possible for an alignment to become shorter or longer as a result of the realignment of a  region  This is because gaps may have to be inserted in  or deleted from  the sequences not  selected for realignment  This will only occur for entire columns of gaps in these sequences   ensuring that their relative alignment is unchanged     Realigning a selection is a very powerful tool for editing alignments in several situations     e Removing changes  If you change the alignment in a specific region by hand  you may  end up being unhappy with the result  In this case you may of course undo your edits  but  another option is to select the region and realign it     e Adjusting the number of gaps  If you have a region in an alignment which has too many  gaps in your opinion  you can select the region and realign it  By choosing a relatively high  gap cost you will be able to reduce the number of gaps     e Combine with fixpoints  If you have an alignment where two residues are not aligned  but  you know that they should have been  You can now set an alignment fixpoint on each of  the two residues  select the region and realign it using the fixpoints  Now  the two residues  are aligned with each other and everything in the selected region around them is adjusted  to accommodate this change     19 4 Join 
250. cost   Open  5  Extension  2     gt    Command line options    Figure 2 49  Settings for searching for primer binding sites     Low complexity filter   Expect value  Standard BLAST   blastp BLSUMO2  Remote homologues   blastp 20000 PAM30    These settings are shown in figure 2 50           EB Local BLAST  E     1  Select sequences of same   Sem put parameters o    type  2  Set program parameters  3  Set input parameters    Choose parameters     F  Low Complexity  Choose Filter                Mask lower case  Expect  20000  Word size  2  No of processors  2  Matrix  PAM30  Gap cost  Existence  9 Extension  1 w     gt    Command line options    Figure 2 50  Settings for searching for remote homologues        2 9 4 Further reading    A valuable source of information about BLAST can be found athttp   blast ncbi nlm nih   gov Blast cgi CMD Web amp PAGE_TYPE BlastDocs amp DOC_TYPE ProgSelectionGuide     Remember that BLAST is a heuristic method  This means that certain assumptions are made to    CHAPTER 2  TUTORIALS 69    allow searches to be done in a reasonable amount of time  Thus you cannot trust BLAST search  results to be accurate  For very accurate results you should consider using other algorithms  such  as Smith Waterman  You can read  Bioinformatics explained  BLAST versus Smith Waterman   here  http    www clcbio com BE     2 10 Tutorial  Align protein sequences    This tutorial outlines some of the alignment functionality of the CLC DNA Workbench  In addition  to creati
251. cription of the top database  hit against each query sequence  and the number of hits found     12 2 1 Graphical overview for each query sequence    Double clicking on a given row of a tabular blast table opens a graphical overview of the blast  results for a particular query sequence  as shown in figure figure 12 8  In cases where only one  sequence was entered into a BLAST search  such a graphical overview is the default output     Figure 12 8 shows an example of a BLAST result for an individual query sequence in the CLC DNA  Workbench     Detailed descriptions of the overview BLAST table and the graphical BLAST results view are  described below     12 2 2 Overview BLAST table    In the overview BLAST table for a multi sequence blast search  as shown in figure 12 9  there is  one row for each query sequence  Each row represents the BLAST result for this query sequence     Double clicking a row will open the BLAST result for this query sequence  allowing more detailed  investigation of the result  You can also select one or more rows and click the Open BLAST  Output button at the bottom of the view  Clicking the Open Query Sequence will open a sequence  list with the selected query sequences  This can be useful in work flows where BLAST is used as  a filtering mechanism where you can filter the table to include e g  sequences that have a certain  top hit and then extract those     CHAPTER 12  BLAST SEARCH 181    ATP8al       2QU AT8BA1 HUMAN      NTI2 ATBA2_H    8198 AT8B2_H
252. ct nucleotide   set trim parameters  sequences    2  Set trim parameters    Sequence trimming       Ignore existing trim information      Trim using quality scores  Limit  0 02      Trim using ambiguous nucleotides    Residues     Vector trimming  Trim contamination from vectors in UniVec database  Trim contamination from saved sequences  to be chosen in the next step     limit    moderate    CICS  eem  nee  Sie  Xo                      Figure 17 15  Setting parameters for trimming     The following parameters can be adjusted in the dialog     e Ignore existing trim information  If you have previously trimmed the sequences  you can  check this to remove existing trimming annotation prior to analysis     e Trim using quality scores  If the sequence files contain quality scores from a base caller  algorithm this information can be used for trimming sequence ends  The program uses the  modified Mott trimming algorithm for this purpose  Richard Mott  personal communication      Quality scores in the Workbench are on a Phred scale in the Workbench  formats using  other scales are converted during import   First step in the trim process is to convert the    Q  quality score  Q  to error probability  perror   10 19   This now means that low values are  high quality bases      CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 290    Next  for every base a new value is calculated  Limit     perror  This value will be negative  for low quality bases  where the error probability is high
253. ction Site Analysis Es       1  Select DNA RNA   Number of cut Site   sequence s   2  Enzymes to be considered  in calculation  3  Number of cut sites  Display enzymes with      F  No restriction site  0   v  One restriction site  1    E  Two restriction sites  2   Three restriction sites  3     N restriction sites    Any number of restriction sites  gt  0             Ls       Previous     gt  Next    X cancel           Figure 2 58  Selecting output for restriction map analysis     Click Finish to start the restriction map analysis     CHAPTER 2  TUTORIALS  4    View restriction site    The restriction sites are shown in two views  one view is in a tabular format and the other view  displays the sites as annotations on the sequence     The result is shown in figure 2 59  The restriction map at the bottom can also be shown as a    act ATP8al mRNA        ATP8al MRNA GTGGGAGGCGCGGCCCCGCGGCAGCTGAGCCCTCTGCGG    Filter  All    Pattern Overhang Number of c    Cut position s     agtacec E  1 1208  cogcgg 3 1 119       Figure 2 59  The result of the restriction map analysis is displayed in a table at the bottom and as  annotations on the sequence in the view at the top     table of fragments produced by cutting the sequence with the enzymes   Click the Fragments button     at the bottom of the view  In a similar way the fragments can be shown on a virtual gel     Click the Gel button   E    at the bottom of the view    Part Il    Core Functionalities    15    Chapter 3    User interfa
254. culation  3  Number of cut sites eed  utput options  4  Result handling sting       Add restriction sites as annotations to sequence s   v  Create restriction map    Create list of cutting enzymes    Result handling     Open    Save    Log handling  Make log                    q        Previous   Ne   wf Einish   XX Cancel      Figure 18 44  Choosing to add restriction sites as annotations or creating a restriction map        e Create restriction map  When a restriction map is created  it can be shown in three  different ways         As a table of restriction sites as shown in figure 18 46  If more than one sequence  were selected  the table will include the restriction sites of all the sequences  This  makes it easy to compare the result of the restriction map analysis for two sequences         As a table of fragments which shows the sequence fragments that would be the result  of cutting the sequence with the selected enzymes  see figure18 4 7          As a virtual gel simulation which shows the fragments as bands on a gel  see  figure 18 49      For more information about gel electrophoresis  see section 18 4     The following sections will describe these output formats in more detail     In order to complete the analysis click Finish  See section 9 2 for information about the Save and  Open options      Restriction sites as annotation on the sequence    If you chose to add the restriction sites as annotation to the sequence  the result will be similar  to the sequence sho
255. d BLAST databases  These can either be created  from within the Workbench using the Create BLAST Database tool  see section 12 3 3  or they  can be pre formatted BLAST databases     The list of locations can be modified using the Add Location and Remove Location buttons   Once the Workbench has scanned the locations  it will keep a cache of the databases  in order    CHAPTER 12  BLAST SEARCH 189    to improve performance   If you have added new databases that are not listed  you can press  Refresh Locations to clear the cache and search the database locations again     By default a BLAST database location will be added under your home area in a folder called  CLCdatabases  This folder is scanned recursively  through all subfolders  to look for valid  databases  All other folderlocations are scanned only at the top level     Below the list of locations  all the BLAST databases are listed with the following information     e Name  The name of the BLAST database   e Description  Detailed description of the contents of the database   e Date  The date the database was created     e Sequences  The number of sequences in the database     Type  The type can be either nucleotide  DNA  or protein     Total size  1000 residues   The number of residues in the database  either bases or amino  acid     e Location  The location of the database     Below the list of BLAST databases  there is a button to Remove Database  This option will delete  the database files belonging to the database se
256. d Tongaonkar  1990  Kolaskar  A  S  and Tongaonkar  P  C   1990   A semi empirical  method for prediction of antigenic determinants on protein antigens  FEBS Lett  2 0 1 2  1 2    174      Kyte and Doolittle  1982  Kyte  J  and Doolittle  R  F   1982   A simple method for displaying  the hydropathic character of a protein  J Mol Biol  15  1  105 132      Larget and Simon  1999  Larget  B  and Simon  D   1999   Markov chain monte carlo algorithms  for the bayesian analysis of phylogenetic trees  Mol Biol Evol  16 750 759      Leitner and Albert  1999  Leitner  T  and Albert  J   1999   The molecular clock of HIV 1 unveiled  through analysis of a known transmission history  Proc Natl Acad Sci USA  96 19  10752   1075 7      Maizel and Lenk  1981  Maizel  J  V  and Lenk  R  P   1981   Enhanced graphic matrix analysis  of nucleic acid and protein sequences  Proc Natl Acad Sci US A  78 12   665  669      McGinnis and Madden  2004  McGinnis  S  and Madden  T  L   2004   BLAST  at the core of  a powerful and diverse set of sequence analysis tools  Nucleic Acids Res  32 Web Server  issue  W20 W25      Meyer et al   2007  Meyer  M   Stenzel  U   Myles  S   Prufer  K   and Hofreiter  M   2007    Targeted high throughput sequencing of tagged nucleic acid samples  Nucleic Acids Res   35 15  e97      Michener and Sokal  1957  Michener  C  and Sokal  R   1957   A quantitative approach to a  problem in classification  Evolution  11 130 162      Purvis  1995  Purvis  A   1995   A composite e
257. d be accepted in the CLC interface  Some  commonly used Entrez queries are pre entered and can be chosen in the drop down menu     CHAPTER 12  BLAST SEARCH 1 7    e Choose filter        Low complexity  Mask off segments of the query sequence that have low compo   sitional complexity  Filtering can eliminate statistically significant  but biologically  uninteresting reports from the BLAST output  e g  hits against common acidic   basic   or proline rich regions   leaving the more biologically interesting regions of the query  sequence available for specific matching against database sequences         Mask lower case  If you have a sequence with regions denoted in lower case  and  other regions in upper case  then choosing this option would keep any of the regions  in lower case from being considered in your BLAST search     e Expect  The threshold for reporting matches against database sequences  the default  value is 10  meaning that under the circumstances of this search  10 matches are expected  to be found merely by chance according to the stochastic model of Karlin and Altschul   1990   Details of how E values are calculated can be found at the NCBI  http   www   ncbi nlm nih gov BLAST tutorial Altschul 1 htm1 If the E value ascribed to  a match is greater than the EXPECT threshold  the match will not be reported  Lower EXPECT  thresholds are more stringent  leading to fewer chance matches being reported  Increasing  the threshold results in more matches being reported  b
258. d close the views  You do not need to save the fragment table     CHAPTER 2  TUTORIALS 95           2  Find Binding Sites and Create Fragments X       1  Select nudeotide   Resulehane ng  sequence s  to match  primer against    2  Set Primer properties Output format  3  Result handling Add binding site annotations  Create binding site table    Create fragment table  Min  fragment length 100    Max  fragment length 4 000 5    Result handling                            Figure 2 28  Creating the fragment table including fragments up to 4000 bp        E   ATP8a1 mRNA               Fragment length Region Other fragments Fwd  primer Melt  temp  Rev  primer Melt  temp  Diff  Melt  temp     Annotate Fragment    Open Fragment    Figure 2 29  Opening the fragment as a sequence     2 6 4 Specify restriction sites and perform cloning  The final step in this tutorial is to insert the fragment into the cloning vector   Toolbox in the Menu Bar   Cloning and Restriction Sites  si    Cloning  ij     Select the Fragment  ATP8al mRNA  ATP8al fwd   ATP8al rev   sequence you just  saved and click Next  In this dialog  use the Browse  acy  button to select pcDNA4_ TO cloning  vector also located in the Cloning folder  Click Finish     You will now see the cloning editor where you will see the pcDNA4 TO vector in a circular view   Press and hold the Ctrl   8 on Mac  key while you click first the Hindlll site and next the Xhol site   see figure 2 30      At the bottom of the view you can now see inf
259. d in the View Area by their tabs  The order of the views can be changed using  drag and drop  E g  drag the tab of one view onto the tab of a another  The tab of the first view is  now placed at the right side of the other tab     lf a tab is dragged into a view  an area of the view is made gray  see fig  3 12  illustrating that  the view will be placed in this part of the View Area     The results of this action is illustrated in figure 3 13   You can also split a View Area horizontally or vertically using the menus   Splitting horisontally may be done this way    right click a tab of the view   View   Split Horizontally          This action opens the chosen view below the existing view   See figure 3 14   When the split is  made vertically  the new view opens to the right of the existing view     Splitting the View Area can be undone by dragging e g  the tab of the bottom view to the tab of  the top view  This is marked by a gray area on the top of the view     CHAPTER 3  USER INTERFACE 89       ner P6046      ar  PEB053O       PF68225 RLLVVYPWTQRFFESFGDLSSPDAVMGNPK    P6s225 VKAHGKKVLGAFSDGLNHLDNLKGTFAQLS          PF68225 ELHCDKLHVDPENFKLLGNVLVCVLAHHFG          Figure 3 12  When dragging a view  a gray area indicates where the view will be shown     ast POSO46 O ne Pagosa O  act PEBDES EQ                                    gt     HHT    F  S063 LLIVYPWTQRFFASFGNLSSPTAIIGNPMV       4    agt PERZ2S ED          Fhbo225 RLLVVYPWTORFFESFGDLSSPDAVMENPK                  Figure 3 
260. d since you opened it  you are  asked if you want to save     When saving a new view that has not been opened from the Navigation Area  e g  when opening  a sequence from a list of search hits   a save dialog appears  figure 3 11      In the dialog you select the folder in which you want to save the element     After naming the element  press OK    3 2 5 Undo Redo    If you make a change in a view  e g  remove an annotation in a sequence or modify a tree  you  can undo the action  In general  Undo applies to all changes you can make when right clicking in  a view  Undo is done by     Click undo       in the Toolbar    CHAPTER 3  USER INTERFACE 88       E save       SEIECE name and TOC  clon Tor nen  iN  Folder Update All     CLC_Data  XX ATP8al mRNA    fs ATP8al    FEE alignment 1  4  ATP8al ortholog tree       Syt P39524  Ss P57792  ifs Q29449    Ht QONTI2    fee 09033 x  i d   Q    lt enter search term gt      Name  GERZE    So ER  Se                 Figure 3 11  Save dialog     or Edit   Undo      or Ctrl  Z  If you want to undo several actions  just repeat the steps above  To reverse the undo action   Click the redo icon in the Toolbar  or Edit   Redo         or Ctrl Y    Note  Actions in the Navigation Area  e g  renaming and moving elements  cannot be undone   However  you can restore deleted elements  see section 3 1 7      You can set the number of possible undo actions in the Preferences dialog  see section 5      3 2 6 Arrange views in View Area    Views are arrange
261. d the origin of  DNA and protein sequences  Selections or the entire text of the Sequence Text View can be  copied and pasted into other programs     Much of the information is also displayed in the Sequence info  where it is easier to get an  overview  see section 10 4      In the Side Panel  you find a search field for searching the text in the view     10 6 Creating a new sequence    A sequence can either be imported  downloaded from an online database or created in the CLC  DNA Workbench  This section explains how to create a new sequence     New  5  in the toolbar       r  BB Create Sequence  ES  e  1  Enter Sequence Data BE aL Es  Name  P7070  Common name  house mouse  Latin name  Musmusculus  Type  X DNA  xx RNA      amp   0  Protein    Description  Probable phospholipid transporting ATPase IA  Sequence  required  180  1 mptmrrtvse irsraegyek tddvsektsl adqeevrtif ingpqltkfc nnh  vstakyn   61 vitflprfly sqfrraansf flfiallqqi pdvsptgryt tlvpllfil   a vaaikeiied  121 ikrhkadnav nkkqtqvlrn gaweivhwek vnvgdiviik gkeyipadt   v llsssepqam            NS  Xen             Figure 10 14  Creating a sequence     The Create Sequence dialog  figure 10 14  reflects the information needed in the GenBank  format  but you are free to enter anything into the fields  The following description is a guideline  for entering information about a sequence     e Name  The name of the sequence  This is used for saving the sequence   e Common name  A common name for the species    e Latin name  Th
262. dd Default Rows  Note that this  will not reset the table but only add all the default rows to the existing rows     18 2 2 Create entry clones  BP     The next step in the Gateway cloning work flow is to recombine the attB flanked sequence of  interest into a donor vector to create an entry clone  the so called BP reaction     Toolbox in the Menu Bar   Cloning and Restriction Sites  si    Gateway Cloning   H    Create Entry Clone           This will open a dialog where you can select on or more sequences that will be the sequence of  interest to be recombined into your donor vector  Note that the sequences you select should be  flanked with attB sites  see section 18 2 1   You can select more than one sequence as input   and the corresponding number of entry clones will be created     When you have selected your sequence s   click Next     This will display the dialog shown in figure 18 23     CHAPTER 18  CLONING AND CUTTING 325    1  Select attB flanked  fragments    2  Select a donor vector    Donor vectors    x pDONR221    Fragments  XxX ATP8a1 mRNA Atp8al  attB 1 attB2        Figure 18 23  Selecting one or more donor vectors     Clicking the Browse  55  button opens a dialog where you can select a donor vector  You  can download donor vectors from Invitrogen   s web site  http    tools invitrogen com   downloads Gateway S20vectors ma4 and import into the CLC DNA Workbench  Note that  the Workbench looks for the specific sequences of the attP sites in the sequences that y
263. dd to define the first element  This will bring up the dialog shown in 17 7     At the top of the dialog  you can choose which kind of element you wish to define     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 284    EI Define tag    Linker type   Linker length   Barcode sequence  Reverse sequence    Min length Max length  250     da    J   Cancel     Help         Figure 17 7  Defining an element of the barcode system     e Linker  This is a sequence which should just be ignored   it is neither the barcode nor the  sequence of interest  Following the example in figure 17 6  it would be the four nucleotides  of the Srfl site  For this element  you simply define its length   nothing else     e Barcode  The barcode is the stretch of nucleotides used to group the sequences  For that   you need to define what the valid bases are  This is done when you click Next  In this  dialog  you simply need to specify the length of the barcode     e Sequence  This element defines the sequence of interest  You can define a length interval  for how long you expect this sequence to be  The sequence part is the only part of the read  that is retained in the output  Both barcodes and linkers are removed     The concept when adding elements is that you add e g  a linker  a barcode and a sequence in the  desired sequential order to describe the structure of each sequencing read  You can of course  edit and delete elements by selecting them and clicking the buttons below  For the example from  
264. de CGT                         Figure 17 12  A preview of the result                CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 288    With this data set we got the four groups as expected  shown in figure 17 13   The Not grouped  list contains 445 560 reads that will have to be discarded since they do not have any of the  barcodes        el Ee tagged processed  j ve   Mot grouped  i tem   Barcode   CCT   j   P  Barcode   CGT   i tem   Barcode   GGT   r  Barcode   AAT     Figure 17 13  The result is one sequence list per barcode and a list with the remainders    17 3 Trim sequences    CLC DNA Workbench offers a number of ways to trim your sequence reads prior to assembly   Trimming can be done either as a separate task before assembling  or it can be performed as an  integrated part of the assembly process  see section 17 4      Trimming as a separate task can be done either manually or automatically     In both instances  trimming of a sequence does not cause data to be deleted  instead both  the manual and automatic trimming will put a  Trim  annotation on the trimmed parts as an  indication to the assembly algorithm that this part of the data is to be ignored  see figure 17 14    This means that the effect of different trimming schemes can easily be explored without the loss  of data  To remove existing trimming from a sequence  simply remove its trim annotation  see  section 10 3 2      Trim    CAGCACAGAGGTCATACTGGCATTCTGAACG       Figure 17 14  Trimming creates ann
265. ded  when you click Finish  Note that this list is dynamically updated when you change the number of    cut sites  The enzymes shown in brackets    are enzymes which are already present in the Side  Panel     If you have selected more than one region on the sequence  using Ctrl or      they will be treated  as individual regions  This means that the criteria for cut sites apply to each region     Show enzymes with compatible ends    Besides what is described above  there is a third way of adding enzymes to the Side Panel and  thereby displaying them on the sequence  It is based on the overhang produced by cutting with  an enzyme and will find enzymes producing a compatible overhang     right click the restriction site   Show Enzymes with Compatible Ends  T  1   This will display the dialog shown in figure 18 39        G Show Enzymes with Compatible Ends to Taql             1  Please choose enzymes  Bs mice asas ules  Enzyme list     Exact matches only    All matches    Select enzymes to be added to Side Panel    Enzymes with compatible ends Enzymes added to Side Panel  Filter  Filter     Name    Q   lt   a  x     w     Oo    Methylation Popularity   Name Overhang Methylation Popularity        Clal N6 methyl         a Taql N  6 methyl            5 cg  N6 methyl           Clal S  cg N6 methyl          5 methylcy        Hpall 5  cg 5 methylcy         5 methylcy        MspI 5  cg 5 methylcy                 5 methylcy            EM       o   a  LANN  O o EO oe RA et RM  9N NAAN MH
266. ded displaying this name     Sequence The actual sequence to be inserted  The sequence is always defined on the sense  strand  although the reverse primer would be reverse complement      CHAPTER 18  CLONING AND CUTTING 324    Preferences              RR Masi Ae ac    0  t   Default Data Location        General a a   URL to use when blasting  http  blast  ncbi nlm  nih  govsBlask  cgi  Maximum number of simultaneous requests     Delay fin ms  between requests  3000       URL to use when blasting    http   blast ncbi nim nih gowiBlast  cgi    Maximum number of simultaneous requests   Delay fin ms  between requests  3000          Sequence Annotation type Forward primer    Reverse primer       Shine Dalgarno ASGAGGT RBS m     d   o  e pm   BE   E    DOT  U a    Add Default Rows Delete Row Add Row Ls  Ww    Hk    Figure 18 22  Configuring the list of primer additions available when adding attB sites           Annotation type The annotation type used for the annotation that is added to the fragment     Forward primer addition Whether this addition should be visible in the list of additions for the  forward primer     Reverse primer addition Whether this addition should be visible in the list of additions for the  reverse primer     You can either change the existing elements in the table by double clicking any of the cells  or  you can use the buttons below to  Add Row or Delete Row  If you by accident have deleted or  modified some of the default primer additions  you can press A
267. ds Res  36 19  e122      Crooks et al   2004  Crooks  G  E   Hon  G   Chandonia  J  M   and Brenner  S  E   2004    WebLogo  a sequence logo generator  Genome Res  14 6  1188 1190      Dayhoff and Schwartz  1978  Dayhoff  M  O  and Schwartz  R  M   1978   Atlas of Protein  Sequence and Structure  volume 3 of 5 suppl   pages 353 358  Nat  Biomed  Res  Found    Washington D C      Dempster et al   1977  Dempster  A   Laird  N   Rubin  D   et al   1977   Maximum likelihood  from incomplete data via the EM algorithm  Journal of the Royal Statistical Society  39 1  1 38      Eddy  2004  Eddy  S  R   2004   Where did the BLOSUM62 alignment score matrix come from   Nat Biotechnol  22 8  1035 1036      Eisenberg et al   1984  Eisenberg  D   Schwarz  E   Komaromy  M   and Wall  R   1984   Analysis  of membrane and surface protein sequences with the hydrophobic moment plot  J Mol Biol   1 9 1  125 142     400    BIBLIOGRAPHY 401     Emini et al   1985  Emini  E  A   Hughes  J  V   Perlow  D  S   and Boger  J   1985   Induction of  hepatitis a virus neutralizing antibody by a virus specific synthetic peptide  J Virol  55 3  836   839      Engelman et al   1986  Engelman  D  M   Steitz  T  A   and Goldman  A   1986   Identifying  nonpolar transbilayer helices in amino acid sequences of membrane proteins  Annu Rev  Biophys Biophys Chem  15 321 353      Felsenstein  1981  Felsenstein  J   1981   Evolutionary trees from DNA sequences  a maximum  likelihood approach  J Mol Evol  17 6  368 37
268. e     e Organism  Sequences which contain information about organism can be searched  In this  way  you could search for e g  Homo sapiens sequences     Database fields  If your data is stored in a CLC Bioinformatics Database  you will be able  to search for custom defined information  Read more in the database user manual     98    CHAPTER 4  SEARCHING YOUR DATA 99    Only the first item in the list  Name  is available for all kinds of data  The rest is only relevant for  sequences     If you wish to perform a search for sequence similarity  use Local BLAST  see section 12 1 3   instead     4 2 Quick search    At the bottom of the Navigation Area there is a text field as shown in figure 4 1    o AA Lodo ta o     EA CLC_Data  S E Example Data  FF Extra   amp   F5 Nucleotide  oa Protein    SER EADME      cai Recycle bin  12     Qy   center search term A    Figure 4 1  Search simply by typing in the text field and press Enter     To search  simply enter a text to search for and press Enter     4 2 1 Quick search results    To show the results  the search pane is expanded as shown in figure 4 2     Li Lj  Los ma L     E H TLC Data   Es Example Data   Pa Extra  PF Nucleotide  EF Protein           README   aie  Recycle bin  14                 e      Figure 4 2  Search resulis     Showing 1   50          If there are many hits  only the 50 first hits are immediately shown  At the bottom of the pane  you can click Next  E  to see the next 50 hits  see figure 4 3      lf a search giv
269. e  The name of the motif  In the result of a motif search  this name will appear as the  name of the annotation and in the result table     e Motif  The actual motif  See section 13 7 2 for more information about the syntax of  motifs     e Description  You can enter a description of the motif  In the result of a motif search   the description will appear in the result table and added as a note to the annotation on  the sequence  visible in the Annotation table      or by placing the mouse cursor on the  annotation      e Type  You can enter three different types of motifs  Simple motifs  java regular expressions  or PROSITE regular expression  Read more in section 13 7 2     The motif list can contain a mix of different types of motifs  This is practical because some  motifs can be described with the simple syntax  whereas others need the more advanced regular  expression syntax     Instead of manually adding motifs  you can Import From Fasta File  55   This will show a dialog  where you can select a fasta file on your computer and use this to create motifs  This will  automatically take the name  description and sequence information from the fasta file  and put  it into the motif list  The motif type will be  simple      CHAPTER 13  GENERAL SEQUENCE ANALYSES 228    Besides adding new motifs  you can also edit and delete existing motifs in the list  To edit a  motif  either double click the motif in the list  or select and click the Edit  4   button at the  bottom of the vie
270. e  eo  ni  nih  oer blast  ab     O   Reilly book on BLAST  http   www oreilly com catalog blast     Explanation of scoring substitution matrices and more  http   www clcbio com be     CHAPTER 12  BLAST SEARCH 198    Creative Commons License    All CLC bio   s scientific articles are licensed under a Creative Commons Attribution NonCommercial   NoDerivs 2 5 License  You are free to copy  distribute  display  and use the work for educational  purposes  under the following conditions  You must attribute the work in its original form and   CLC bio  has to be clearly labeled as author and provider of the work  You may not use this  work for commercial purposes  You may not alter  transform  nor build upon this work     SOME RIGHTS RESERVED    See http   creativecommons org licenses by nc nd 2 5  for more information on  how to use the contents     Chapter 13    General sequence analyses    Contents  13 1 Shuffle sequence        0  00 ee eee ee ee a 199  13 2 Ot PIOUS osmose oo ee eee RD E oe 201  SA CORRO  4248 6 Pale oe bee ES Owe ee ee EE RE 201  13 2 2 View dotpiots visse DAE PEGA EEE ew wD Oe ee E 203  13 2 3 Bioinformatics explained  Dot plots                 2 008 oe 204  13 2 4 Bioinformatics explained  Scoring matrices                  208  13 3 Local complexity plot       1    2 ee ee te a 211  13 4 Sequence statistics             a aoao oaos a a a s a 212  13 4 1 Bioinformatics explained  Protein statistics       o  aoao aoao o a aa 215  13 5 Join sequences 2 24288 2b e
271. e  we welcome all requests and feedback from users  and hope suggest new  features or more general improvements to the program on support clcbio com     1 5 2 Report program errors    CLC bio is doing everything possible to eliminate program errors  Nevertheless  some errors  might have escaped our attention  If you discover an error in the program  you can use the  Report a Program Error function in the Help menu of the program to report it  In the Report a  Program Error dialog you are asked to write your e mail address  optional   This is because we  would like to be able to contact you for further information about the error or for helping you with  the problem     Note  No personal information is sent via the error report  Only the information which can be  seen in the Program Error Submission Dialog is submitted     You can also write an e mail to Ssupport clcbio com  Remember to specify how the program error  can be reproduced     All errors will be treated seriously and with gratitude     We appreciate your help     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 29    Start in safe mode    lf the program becomes unstable on start up  you can start it in Safe mode  This is done by  pressing and holding down the Shift button while the program starts     When starting in safe mode  the user settings  e g  the settings in the Side Panel  are deleted  and cannot be restored  Your data stored in the Navigation Area is not deleted  When started  in safe mode  some of the funct
272. e 16 3      There are two different ways to display the information relating to a single primer  the detailed  and the compact view  Both are shown below the primer regions selected on the sequence     16 3 1 Compact information mode    This mode offers a condensed overview of all the primers that are available in the selected region   When a region is chosen primer information will appear in lines beneath it  see figure 16 4      To PERHIBOC                     Pa   TIMER VeESIQner sec     ha     e 3 m      onward primer re         Primer information     gt   aii          PERH3B   GTGAGTCTGATGGGTCTGCCCATGGTIITCCTICCICTAGT  7  Show  Lot  18 POPPER URC UEP EEE rs 2  Compact  19      2  Comp  Lat al laa Primer covering positions 20 to 37       Lot  20 TIETE         Detailed  Fraction of G and     0 5 E  Lat  24   eecceeeecee00e0000  Melting temperature  55 23   C  Self annealing  16  Lot  22 eeeeeeeeeeeeeeeeees  Sof and annealing  2  Secondary structure  10    60 ai         a TICTGGGCTTACCTICCTATCAGA AGG AAA TGGGAAGAGA    Lgt  19 w      lt     Figure 16 4  Compact information mode    The number of information lines reflects the chosen length interval for primers and probes  One  line is shown for every possible primer length  if the length interval is widened more lines will  appear  At each potential primer starting position a circle is shown which indicates whether the  primer fulfills the requirements set in the primer parameters preference group  A green primer  indicates a
273. e Ctrl button while making selections   Holding down the Shift button lets you extend or reduce an existing selection to the position you  clicked     To select a part of a sequence covered by an annotation     CHAPTER 10  VIEWING AND EDITING SEQUENCES 149    right click the annotation   Select annotation  or double click the annotation  To select a fragment between two restriction sites that are shown on the sequence   double click the sequence between the two restriction sites     Read more about restriction sites in section 10 1 2      Open a selection in a new view  A selection can be opened in a new view and saved as a new sequence   right click the selection   Open selection in New View         This opens the annotated part of the sequence in a new view  The new sequence can be saved  by dragging the tab of the sequence view into the Navigation Area     The process described above is also the way to manually translate coding parts of sequences   CDS  into protein  You simply translate the new sequence into protein  This is done by     right click the tab of the new sequence   Toolbox   Nucleotide Analyses  GA     Translate to Protein  2      A selection can also be copied to the clipboard and pasted into another program   make a selection   Ctrl   C  36   C on Mac   Note  The annotations covering the selection will not be copied     A selection of a sequence can be edited as described in the following section     10 1 4 Editing the sequence  When you make a selection  i
274. e Latin name for the species    e Type  Select between DNA  RNA and protein     e Circular  Specifies whether the sequence is circular  This will open the sequence in a  circular view as default   applies only to nucleotide sequences      e Description  A description of the sequence     e Keywords  A set of keywords separated by semicolons         CHAPTER 10  VIEWING AND EDITING SEQUENCES 163    e Comments  Your own comments to the sequence     e Sequence  Depending on the type chosen  this field accepts nucleotides or amino acids   Spaces and numbers can be entered  but they are ignored when the sequence is created   This allows you to paste  Ctrl   V on Windows and 3     V on Mac  in a sequence directly  from a different source  even if the residue numbers are included  Characters that are not  part of the IUPAC codes cannot be entered  At the top right corner of the field  the number  of residues are counted  The counter does not count spaces or numbers     Clicking Finish opens the sequence  It can be saved by clicking Save     or by dragging the tab  of the sequence view into the Navigation Area     10 7 Sequence Lists    The Sequence List shows a number of sequences in a tabular format or it can show the  sequences together in a normal sequence view     Having sequences in a sequence list can help organizing sequence data  The sequence list  may originate from an NCBI search  chapter 11 1   Moreover  if a multiple sequence fasta file  is imported  it is possible to stor
275. e On the welcome screen  click Next   e Read and accept the License agreement and click Next   e Choose where you would like to install the application and click Next     e Choose a name for the Start Menu folder used to launch CLC DNA Workbench and click  Next     e Choose if CLC DNA Workbench should be used to open CLC files and click Next     e Choose where you would like to create shortcuts for launching CLC DNA Workbench and  click Next     e Choose if you would like to associate  clc files to CLC DNA Workbench  If you check this  option  double clicking a file with a  clc  extension will open the CLC DNA Workbench     e Wait for the installation process to complete  choose whether you would like to launch CLC  DNA Workbench right away  and click Finish     When the installation is complete the program can be launched from the Start Menu or from one  of the shortcuts you chose to create     1 2 3 Installation on Mac OS X    Starting the installation process is done in one of the following ways     If you have downloaded an installer   Locate the downloaded installer and double click the icon   The default location for downloaded files is your desktop     If you are installing from a CD   Insert the CD into your CD ROM drive and open it by double clicking on the CD icon on your  desktop     Launch the installer by double clicking on the  CLC DNA Workbench  icon   Installing the program is done in the following steps     e On the welcome screen  click Next     e Read and a
276. e a codon  but with respect to its frequency in the organism     As an example we want to translate an alanine to the corresponding codon  Four different codons  can be used for this reverse translation  GCU  GCC  GCA or GCG  By picking either one by random  choice we will get an alanine     The most frequent codon  coding for an alanine in E  coli is GCG  encoding 33 7  of all alanines   Then comes GCC  25 5    GCA  20 3   and finally GCU  15 3    The data are retrieved from the  Codon usage database  see below  Always picking the most frequent codon does not necessarily  give the best answer     By selecting codons from a distribution of calculated codon frequencies  the DNA sequence  obtained after the reverse translation  holds the correct  or nearly correct  codon distribution  It    CHAPTER 15  PROTEIN ANALYSES 241    should be kept in mind that the obtained DNA sequence is not necessarily identical to the original  one encoding the protein in the first place  due to the degeneracy of the genetic code     In order to obtain the best possible result of the reverse translation  one should use the codon  frequency table from the correct organism or a closely related species  The codon usage of the  mitochondrial chromosome are often different from the native chromosome s   thus mitochondrial  codon frequency tables should only be used when working specifically with mitochondria     Other useful resources    The Genetic Code at NCBI     http   www ncbi nlm nih gov Taxonomy 
277. e advantage of this tool     e Many of the databases listed are very large  Please make sure you have room for them   If you are working on a shared system  we recommend you discuss your plans with your  system administrator and fellow users     e Some of the databases listed are dependent on others  This will be listed in the  Dependencies column of the Download BLAST Databases window  This means that while    CHAPTER 12  BLAST SEARCH 187    the database your are interested in may seem very small  it may require that you also  download a very big database on which it depends     An example of the second item above is Swissprot  To download a database from the NCBI that  would allow you to search just Swissprot entries  you need to download the whole nr database  in addition to the entry for Swissprot     12 3 3 Create local BLAST databases    In the CLC DNA Workbench you can create a local database that you can use for local BLAST  searches  You can specify a location on your computer to save the BLAST database files to  The  Workbench will list the BLAST databases found in these locations when you set up a local BLAST  search  see section 12 1 3      DNA  RNA  and protein sequences located in the Navigation Area can be used to create BLAST  databases from  Any given BLAST database can only include one molecule type  If you wish to  use a pre formatted BLAST database instead  see section 12 3 1     To create a BLAST database  go to        Toolbox   BLAST        Create BLAST 
278. e and hold the mouse button  By moving the mouse you  move the sequence in the View     3 3 6 Selection    The Selection mode h  is used for selecting in a View  selecting a part of a sequence  selecting    nodes in a tree etc    It is also used for moving e g  branches in a tree or sequences in an  alignment     When you make a selection on a sequence or in an alignment  the location is shown in the bottom  right corner of the screen  E g     23  24    means that the selection is between two residues   23   means that the residue at position 23 is selected  and finally  23  25  means that 23  24 and  25 are selected  By holding ctrl   38 you can make multiple selections     3 3 7 Changing compactness  There is a shortcut way of changing the compactness setting for read mappings     or Press and hold Alt key   Scroll using your mouse wheel or touchpad    CHAPTER 3  USER INTERFACE 93    3 4 Toolbox and Status Bar    The Toolbox is placed in the left side of the user interface of CLC DNA Workbench below the  Navigation Area     The Toolbox shows a Processes tab and a Toolbox tab     3 4 1 Processes    By clicking the Processes tab  the Toolbox displays previous and running processes  e g  an  NCBI search or a calculation of an alignment  The running processes can be stopped  paused   and resumed by clicking the small icon     next to the process  see figure 3 17      Running and paused processes are not deleted          Toolbo   Search Database  nucleotide  NC 012671  Al       
279. e conflict or resolved         Conflict  Initially  all the rows in the table have this status  This means that there is  one or more differences between the sequences at this position         Resolved  If you edit the sequences  e g  if there was an error in one of the sequences   and they now all have the same residue at this position  the status is set to Resolved     e Note  Can be used for your own comments on this conflict  Right click in this cell of the  table to add or edit the comments  The comments in the table are associated with the  conflict annotation in the graphical view  Therefore  the comments you enter in the table  will also be attached to the annotation on the consensus sequence  the comments can be  displayed by placing the mouse cursor on the annotation for one second   see figure 17 24    The comments are saved when you Save  ED     By clicking a row in the table  the corresponding position is highlighted in the graphical view   Clicking the rows of the table is another way of navigating the contig  apart from using the Find  Conflict button or using the Space bar  You can use the up and down arrow keys to navigate the  rows of the table     17 8 Reassemble contig    If you have edited a contig  changed trimmed regions  or added or removed reads  you may wish  to reassemble the contig  This can be done in two ways        Toolbox in the Menu Bar   Sequencing Data Analyses  iii         select the contig and click Next        Reassemble Contig    or righ
280. e displayed in the annotation   s box        Over annotation  The labels are displayed above the annotations    Before annotation  The labels are placed just to the left of the annotation    Flag  The labels are displayed as flags at the beginning of the annotation    Stacked  The labels are offset so that the text of all labels is visible  This means that    there is varying distance between each sequence line to make room for the labels     CHAPTER 10  VIEWING AND EDITING SEQUENCES 155    e Show arrows  Displays the end of the annotation as an arrow  This can be useful to see  the orientation of the annotation  for DNA sequences   Annotations on the negative strand  will have an arrow pointing to the left     e Use gradients  Fills the boxes with gradient color     In the Annotation Types group  you can choose which kinds of annotations that should be  displayed  This group lists all the types of annotations that are attached to the sequence s  in the  view  For sequences with many annotations  it can be easier to get an overview if you deselect  the annotation types that are not relevant     Unchecking the checkboxes in the Annotation Layout will not remove this type of annotations  them from the sequence   it will just hide them from the view     Besides selecting which types of annotations that should be displayed  the Annotation Types  group is also used to change the color of the annotations on the sequence  Click the colored  square next to the relevant annotation typ
281. e ee ee ee  10 7 3 Extract sequences sc 2 66 Be eee EG Eww Oe E ew ww a    CLC DNA Workbench offers five different ways of viewing and editing single sequences as  described in the first five sections of this chapter  Furthermore  this chapter also explains how    to create a new sequence and how to gather several sequences in a sequence list     10 1 View sequence    When you double click a sequence in the Navigation Area  the sequence will open automatically   and you will see the nucleotides or amino acids  The zoom options described in section 3 3 allow    141    CHAPTER 10  VIEWING AND EDITING SEQUENCES 142    you to e g  zoom out in order to see more of the sequence in one view  There are a number  of options for viewing and editing the sequence which are all described in this section  All the  options described in this section also apply to alignments  further described in section 19 2      10 1 1 Sequence settings in Side Panel    Each view of a sequence has a Side Panel located at the right side of the view  see figure 10 1     OS et  Fit Width 100  Pan CSS Zoom In Zoom Out          k Sequence layout    k Annotation layout  H Annotation types  k Restriction sites   k Residue coloring     Nucleotide info   k Find    k Text Format  Figure 10 1  Overview of the Side Panel which is always shown to the right of a view     When you make changes in the Side Panel the view of the sequence is instantly updated  To  show or hide the Side Panel     select the View   Ctrl   U  o
282. e eee eh a ee Be SG 94  C435  SOUS DO  si scara dass de ED A AO se haw ee as 94  3 5 MoMA Esses ET ES ee E    94    CHAPTER 3  USER INTERFACE Cf    3 5 1 Create Workspace ici ek ce etd deeb baw a Oe So Be E 94  3 5 2 Select Workspace        0    0 0 ee aa 94  3 5 3 Delete Workspace        0    2 ee ra 95  3 6 List of shortcuts      6 ech 66 2 bee Oe ewe Be ds Re ee we ee ee 95    This chapter provides an overview of the different areas in the user interface of CLC DNA  Workbench  As can be seen from figure 3 1 this includes a Navigation Area  View Area  Menu  Bar  Toolbar  Status Bar and Toolbox                                               9 CLC Dna Workbench 3 0  Current workspace  Default  Sele  File Edit Segch view Toolbox Workspace Help  5 apo  eE   T et  Ba S W A E Ba    ER X A    Show New Import Export Graphics Print Copy Workspace Search Fit Width 100  Pan GOETAN Zoom In Zooms  Menu Bar HSVIQSTION Area Acer AY738615    Toolbar     Tey 1 SN  a f Primer design A n n a  e e mb H  Restriction analy      N avig ation Are    Sequences   Sequence layout   gt  9 PERHZBD Spacing  H206 HUMHBB A     No spacing  te Beh 736615 E     DOC NM 000044      O No wrap View Area   gt   sequence lis  E  DOG PERHSBC v    Auto wrap   lt     amp     Fixed yap  x AY738615 CCTTTAGTGATGGCCTGGCTCACCTG  70000    i Alignments and Trees  Tool box   a  General Sequence Analyses  C  Double stranded    KA Nucleotide Analyses ere     Protein Analyses Se et oe oc  a  Sequencing Data Analyses Relative to 
283. e icon  GQ  next to the search field   see figure 4 5      Qe llength  100 TO 150         Search    length  100 TO 150  ckagina signal       e A       human    name  humhbb       insulin  aboog     Figure 4 5  Recent searches     Clicking one of the recent searches will conduct the search again     4 3 Advanced search    As a supplement to the Quick search described in the previous section you can use the more  advanced search     Search   Local Search       or Ctrl   F  36   F on Mac   This will open the search view as shown in figure 4 6    The first thing you can choose is which location should be searched  All the active locations are  shown in this list  You can also choose to search all locations  Read more about locations in  section 3 1 1     Furthermore  you can specify what kind of elements should be searched     CHAPTER 4  SEARCHING YOUR DATA 102    Search O      Search in Location  within  Add Filter                 x    Label Description Length                      L       Figure 4 6  Advanced search     e All sequences  e Nucleotide sequences  e Protein sequences    e All data    When searching for sequences  you will also get alignments  sequence lists etc as result  if they  contain a sequence which match the search criteria     Below are the search criteria  First  select a relevant search filter in the Add filter  list  For  sequences you can search for    e Name  e Length    e Organism    See section 4 2 2 for more information on individual search terms   F
284. e list  to see more hits     2 4 2 Saving the sequence    The sequences which are found during the search can be displayed by double clicking in the list    of hits  However  this does not save the sequence  You can save one or more sequence by  selecting them and     click Download and Save    or drag the sequences into the Navigation Area    CHAPTER 2  TUTORIALS 45    2 5 Tutorial  Assembly    In this tutorial  you will see how to assemble data from automated sequencers into a contig and  how to find and inspect any conflicts that may exist between different reads     This tutorial shows how to assemble sequencing data generated by conventional  Sanger   sequencing techniques  For high throughput sequencing data  we refer to the CLC Genomics  Workbench  see http   www clcbio com genomics      The data used in this tutorial are the sequence reads in the  Sequencing reads  folder in the   Sequencing data  folder of the Example data in the Navigation Area  If you do not have the  example data  please go to the Help menu to impor it     2 5 1 Trimming the sequences    The first thing to do when analyzing sequencing data is to trim the sequences  Trimming serves a  dual purpose  it both takes care of parts of the reads with poor quality  and it removes potential  vector contamination  Trimming the sequencing data gives a better result in the further analysis     Toolbox in the Menu Bar   Sequencing Data Analyses  161    Trim Sequences      Select the 9 sequences and click Next  
285. e mode  29  Save  changes in a view  87  sequence  44  style sheet  110  view preferences  110  workspace  94  Save enzyme list  330  Scale traces  2 8  SCF2  file format  393  SCF3  file format  393  Score  BLAST search  183  Scoring matrices  Bioinformatics explained  208  BLOSUM  208  PAM  208  Scroll wheel  to zoom in  91  to zoom out  91  Search  101  in one location  101  BLAST  174  175  GenBank  16 7  GenBank file  162  handle results from GenBank  169  hits  number of  105  in a sequence  14 7  in annotations  14 7  in Navigation Area  99  Local BLAST  1 8  local data  3 0  options  GenBank  167  own motifs  227  parameters  168  patterns  219  221  PubMed references  1 1  sequence in UniProt  1 1  sequence on Google  1 1  sequence on NCBI  1 1  sequence on web  1 0  troubleshooting  103  Secondary peak calling  305  Secondary structure  predict RNA  3 9  Secondary structure prediction  3 8  Secondary structure  for primers  253  Select    413    exact positions  14 7  in sequence  148  parts of a sequence  148  workspace  94  Select annotation  148  Selection mode in the toolbar  92  Selection  adjust  148  Selection  expand  148  Selection  location on sequence  92  Self annealing  252  Self end annealing  253  Separate sequences on gel  341  using restriction enzymes  341  Sequence  alignment  347  analysis  199  display different information  82  extract from sequence list  165  find  147  information  160  join  218  layout  142  lists  163  logo  3 8  logo Bioi
286. e needs to be    fulfilled  Match any      For each filter criterion  you first have to select which column it should apply to  Next  you choose  an operator  For numbers  you can choose between     e    equal to      lt   smaller than   e  gt   greater than   e  lt  gt   not equal to     e abs  value  lt   absolute value smaller than  This is useful if it doesn   t matter whether the  number is negative or positive     e abs  value  gt   absolute value greater than  This is useful if it doesn   t matter whether the  number is negative or positive     For text based columns  you can choose between     e contains  the text does not have to be in the beginning     e doesn   t contain    APPENDIX C  WORKING WITH TABLES 385    e    the whole text in the table cell has to match  also lower upper case     Once you have chosen an operator  you can enter the text or numerical value to use     If you wish to reset the filter  simply remove  E  all the search criteria  Note that the last one  will not disappear   it will be reset and allow you to start over     Figure C 3 shows an example of an advanced filter which displays the open reading frames larger  than 400 that are placed on the negative strand     Find reading                                Rows  15 169 Find reading Frame output Filter   Match any     Match all      Length     o rr 128  Apply  Start End Length Found ak strand Start codon  14 rate  ovo negative ANT A  3462 ao  426 negative CAL  ot  14 Sot 1851 negative CAL    
287. e of the Side Panel of a sequence view          Sequence layout      Annotation layout  F Annotation types  k Restriction sites     Residue coloring  b Nucleotide info      Find       k Text Format    Figure 5 8  The Side Panel of a sequence contains several groups  Sequence layout  Annotation  types  Annotation layout  etc  Several of these groups are present in more views  E g  Sequence  layout is also in the Side Panel of alignment views     By clicking the black triangles or the corresponding headings  the groups can be expanded or  collapsed  An example is shown in figure 5 9 where the Sequence layout is expanded     The content of the groups is described in the sections where the functionality is explained  E g   Sequence Layout for sequences is described in chapter 10 1 1     When you have adjusted a view of e g  a sequence  your settings in the Side Panel can be saved   When you open other sequences  which you want to display in a similar way  the saved settings  can be applied  The options for saving and applying are available in the top of the Side Panel   see figure 5 10      To save and apply the saved settings  click      seen in figure 5 10  This opens a menu  where  the following options are available     e Save Settings  This brings up a dialog as shown in figure 5 11 where you can enter a name  for your settings  Furthermore  by clicking the checkbox Always apply these settings  you  can choose to use these settings every time you open a new view of this type
288. e reporter dye  It is recommended that the melting  temperature of the TaqMan probe is about 10 degrees celsius higher than that of the primer pair     Primer design for TaqMan technology involves designing a primer pair and a TaqMan probe     In TaqMan the user must thus define three regions  a Forward primer region  a Reverse primer  region  and a TaqMan probe region  The easiest way to do this is to designate a TaqMan  primer probe region spanning the sequence region where TaqMan amplification is desired  This  will automatically add all three regions to the sequence  If more control is desired about the  placing of primers and probes the Forward primer region  Reverse primer region and TaqMan  probe region can all be defined manually  If areas are known where primers or probes must not  bind  e g  repeat rich areas   one or more No primers here regions can be defined  The regions  are defined by making a selection on the sequence and right clicking the selection     It is required that at least a part of the Forward primer region is located upstream of the TaqMan  Probe region  and that the TaqMan Probe region  is located upstream of a part of the Reverse  primer region     CHAPTER 16  PRIMERS 263    In TaqMan mode the Inner melting temperature menu in the primer parameters panel is activated  allowing the user to set a Separate melting temperature interval for the TaqMan probe     After exploring the available primers  see section 16 3  and setting the desired parameter 
289. e the data in a sequences list  A Sequence List can also be  generated using a dialog  which is described here     select two or more sequences   right click the elements   New   Sequence List          This action opens a Sequence List dialog        q Create Sequence List Es    1  Select sequences of same me  EC SEQUETICES UI Same typ  Projects  Selected Elements   6       CLC Data 094296     Example Data  XxX ATP8al genomit  XX ATP8al mRNA    s ATP8al             P39524  P57792  Q29449  QONTI2  Q95X33    feces        Protein analyse      Protein ortholog    233322  al       RNA secondary     Sequencing dat        Q   lt enter search term gt    4        wf Ok   XX Cancel                  Figure 10 15  A Sequence List dialog     The dialog allows you to select more sequences to include in the list  or to remove already  chosen sequences from the list     Clicking Finish opens the sequence list  It can be saved by clicking Save  E  or by dragging the  tab of the view into the Navigation Area     Opening a Sequence list is done by     right click the sequence list in the Navigation Area   Show  48      Graphical Sequence  List     OR Table  FE       CHAPTER 10  VIEWING AND EDITING SEQUENCES 164    The two different views of the same sequence list are shown in split screen in figure 10 16     B     sequence list  9    50 100 A       PERHIBA  50 100          PERHIBE                               _                   j    me      50 100 En        PERH2ZBA  E 100 w   E o agaa    eee 
290. e to change the color     This will display a dialog with three tabs  Swatches  HSB  and RGB  They represent three different  ways of specifying colors  Apply your settings and click OK  When you click OK  the color settings  cannot be reset  The Reset function only works for changes made before pressing OK     Furthermore  the Annotation Types can be used to easily browse the annotations by clicking the  small button     next to the type  This will display a list of the annotations of that type  see  figure 10 8        Snnotation types  Ge cos    PR  4  Conflict      B exon     E     SIE EST  EM  2  mena  H8G2  24478   36069  HBGL  39414   40985  BO Dl Old sequen  54740  56389    oP HEB  62137  63742  BM O Precuty oe thalassemia   lt 62187  62389     CF Repea  BO Repeat unit T     Figure 10 8  Browsing the gene annotations on a sequence          Clicking an annotation in the list will select this region on the sequence  In this way  you can  quickly find a specific annotation on a long sequence     View Annotations in a table  Annotations can also be viewed in a table   select the sequence in the Navigation Area   Show  22      Annotation Table  E     or If the sequence is already open   Click Show Annotation Table      at the lower  left part of the view    This will open a view similar to the one in figure 10 9      In the Side Panel you can show or hide individual annotation types in the table  E g  if you    CHAPTER 10  VIEWING AND EDITING SEQUENCES 156    ES NM 000044  
291. e way your sequences   alignments and other data are shown  You will also see how to save the changes that you made  in the Side Panel      Open the protein alignment located under Protein orthologs in the Example data  The initial  view of the alignment has colored the residues according to the Rasmol color scheme  and the  alignment is automatically wrapped to fit the width of the view  Shown in figure 2 5      EE ATPase protei    O             a a      ela Pett O  o nasa ou oo A TS  P39524 MN     BERET PPERKPCERE THE            BETEN            094296 MARBEBNKON AKRISRDEDE DEEACESMic RTEDNPEECE a Every 10 residues z  P57792 MAT             GRRRAR                           No wrap  Q9Sx33 Micce           c FRRRRR                           i Pere  Consensus MAT       X XXRRXR     eee eee tee ew ee eee eee ee F  100  Fixed wrap  a  Conservation RE P    7 o  0  ever 60   residues  e 80  Q29449            Numbers on sequences  Q29449    2000000 una Re   uu  F  O9NTIZ    cece eee Bee eee ee es sane eee ee ee Relative to 1  p39524 BOTTSHScSR SKM TNSHANG WilpPsHZEP EETRDEDADO 65  094296 BEREDRECSE SQMMSSSCQN STNP              BRAD cs ae  P57792 us    abe  Sauces    Ys ces ss es    ea  waea cs es ee es ee Ce le es es ae ee eet ee eee es ae    11 Ide abels  Q9SX33     2 eee eee Be ee umumaumuma Re ee eee 11 J  Lock labels  Consersus     sees 8 5 see enter es eee ee ee eee Se eee sees Sequence label  Conservation Name x  0  Dees Deen er ee ot v Show selection boxes  Eva    Figure 2 5
292. e you wish to use another license  or  see information about the license you currently use  In this case  open the license manager     Help   License Manager  E   The license manager is shown in figure 1 22     Besides letting you borrow licenses  see section 1 4 5   this dialog can be used to     e See information about the license  e g  what kind of license  when it expires     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 2     d CLC DNA Workbench      Feature na    License type  Expires in  Status  Borrow limit   clednawb Network  192 168 1 200   50 days  valid 3 days    License Manager       License overview           License borrowing     IF you use a license server  and need to work outside of your organization network for an extended period of time  your can borrow a copy  of your licenses From the license server  The borrowed license will allow you to use the application for the specified number of hours     Borrow the selected licenses for a period of        Configure Network License Upgrade License    Figure 1 22  The license manager        e Configure how to connect to a license server  Configure License Server the button at the  lower left corner   Clicking this button will display a dialog similar to figure 1 19     e Upgrade from an evaluation license by clicking the Upgrade license button  This will display  the dialog shown in figure 1 1     If you wish to switch away from using a floating license  click Configure License Server and  choose not to connect to a
293. eason why the Enzyme lists folder  is not listed as a batch unit is that it does not contain any sequences     In this overview dialog  the Workbench has filtered the data so that only the types of data  accepted by the tool is shown  DNA sequences in the example above      9 1 2 Batch filtering and counting    At the bottom of the dialog shown in figure 9 3  the Workbench counts the number of files that  will be run in total  90 in this case   This is counted across all the batch units     In some situations it is useful to filter the input for the batching based on names  As an example   this could be to include only paired reads for a mapping  by only allowing names where  paired   is part of the name     This is achieved using the Only use elements containing and Exclude elements containing text  fields  Note that the count is dynamically updated to reflect the number of input files based on  the filtering     lf a complete batch unit should be removed  you can select it  right click and choose Remove  Batch Unit  You can also remove items from the contents of each batch unit using right click and  Remove Element     9 1 3 Setting parameters for batch runs    For some tools  the subsequent dialogs depend on the input data  In this case  one of the units  is specified as parameter prototype and will be used to guide the choices in the dialogs  Per  default  this will be the first batch unit  marked in bold   but this can be changed by right clicking  another batch unit and
294. ee eee we eee te 28  Lose Report program errorS   css E TES DS 28  1 5 3 CLC Sequence Viewer vs  Workbenches               0 0084 29  1 6 When the program is installed  Getting started           0 80 0888 29  LO  TIRO assinar owe ea hee Ea da 29  1 6 2 Import of example data    aoao a e a a 30  Er FPI ose hee Ea ORES ES SDRAM A 30  1 7 1 Installing plug iNS   2 c005 da 0     ap ek a koe E oe ee ee oe ee E 30  1 7 2 Uninstalling plug ins        2    2  ee ee es 31  1 7 3 Updating plug iNnS     cn cee eae a weed SRS dad  do    32  1 4 Pe s ara tae Fo oe eh a ee oe oe ee oe ee 32  1 8 Network configuration     1    2 eee annn 33  1 9 The format of the user manual           2 00 2 eee eee ee es 34  Lo  ECRM 2c ao Pa ee a oe ee eee oe ee am ES 39    10    CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 11    Welcome to CLC DNA Workbench     a software package supporting your daily bioinformatics work     We strongly encourage you to read this user manual in order to get the best possible basis for  working with the software package     This software is for research purposes only     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 12    1 1 Contact information  The CLC DNA Workbench is developed by     CLC bio A S  Science Park Aarhus  Finlandsgade 10 12  8200 Aarhus N  Denmark    hetp   www clco1o  com  VAT no   DK 28 30 50 87    Telephone   45 70 22 55 09  Fax   45 70 22 55 19    E mail  info clcbio com    If you have questions or comments regarding the program  you are welcome to cont
295. ee figure 19 6   Use this  procedure to add fixpoints to the other sequence s  that should be forced to align to each other     When you click  Create alignment  and go to Step 2  check Use fixpoints in order to force the  alignment algorithm to align the fixpoints in the selected sequences to each other     In figure 19 7 the result of an alignment using fixpoints is illustrated     You can add multiple fixpoints  e g  adding two fixpoints to the sequences that are aligned will  force their first fixpoints to be aligned to each other  and their second fixpoints will also be    CHAPTER 19  SEQUENCE ALIGNMENT 352           peso46   ALCTADERA AvVTABWCREN HoEVEcCEADc 25  P6so53  MACTGEBRA AVTABWGRIN PDEVECEADRSG 29    P6B225 PE rodas Es Ener BAs o   MMA T Copy Selection         7 Open Selection in Mew view  PhBoTs PE 5 p  MEGEEEERES AN    Edit Selection  p6s228 MENLSGDERN AV Ei Add Annotation    4dd Gaps After    P68231 MMALSG DERN AV Add Gaps Before n  EH Delete Selection  P68063   WAWTA E Efo EM    Realign Selection 9        Set Alignment Fixpoink Here    PERIJS   WHWTA E Eko LI Set Numbers Relative to This Selection 5  Consensus MVHLTXEEKN AV  E create Pairwise Comparison       Figure 19 6  Adding a fixpoint to a sequence in an existing alignment  At the top you can see a  fixpoint that has already been added     ri    g 100    HBA ANAPE         HBA_ANSSE      ED       Se        HBB_ANAPP  HBB_AQUCH  HBB_CALJA             100 200    HBA ANAPE   E  HBA ANSSE      HBA_ACCGE    
296. eins  you can search with a Prosite regular expression and you should enter a protein  pattern from the PROSITE database     e Accuracy  If you search with a simple motif  you can adjust the accuracy of the motif to the  match on the sequence  If you type in a simple motif and let the accuracy be 80   the motif  search algorithm runs through the input sequence and finds all subsequences of the same  length as the simple motif such that the fraction of identity between the subsequence and  the simple motif is at least 80   A motif match is added to the sequence as an annotation  with the exact fraction of identity between the subsequence and the simple motif  If you  use a list of motifs  the accuracy applies only to the simple motifs in the list     e Search for reverse motif  This enables searching on the negative strand on nucleotide  sequences     e Exclude unknown regions  Genome sequence often have large regions with unknown  sequence  These regions are very often padded with N   s  Ticking this checkbox will not  display hits found in N regions Motif search handles ambiguous characters in the way that  two residues are different if they do not have any residues in common  For example  For  nucleotides  N matches any character and R matches A G  For proteins  X matches any  character and Z matches E Q     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   There are two types of results that can be produced     e Add annot
297. elect one or two select one or two sequences or Same type  sequences of same type   Projects  Selected Elements   2      CLC Data   Ns 09429    Example Data St  ATP8al  XX ATP8a1 genomic  7c ATP8al mRNA  Ae     Cloning        Primers       Protein analyses  Protein ortholog           Ss  Sys P39524    St P57792         t Q29449    lt a   4e QONTIZ  Xs Q9SX33  RNA secondary        Sequencing data              v       5    X Cancel            Figure 13 3  Selecting sequences for the dot plot     If a sequence was selected before choosing the Toolbox action  this sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove elements from  the selected elements  Click Next to adjust dot plot parameters  Clicking Next opens the dialog  Shown in figure 13 4     Notice  Calculating dot plots take up a considerable amount of memory in the computer   Therefore  you see a warning if the sum of the number of nucleotides amino acids in the  sequences is higher than 8000  If you insist on calculating a dot plot with more residues the  Workbench may shut down  allowing you to save your work first  However  this depends on your  computer s memory configuration     Adjust dot plot parameters    There are two parameters for calculating the dot plot     e Distance correction  only valid for protein sequences  In order to treat evolutionary  transitions of amino acids  a distance correction measure can be used when calculating  the dot plot  These distan
298. elf annealing score is measured in number of hydrogen  bonds between two copies of primer molecules  with A T base pairs contributing 2  hydrogen bonds and G C base pairs contributing 3 hydrogen bonds         Self end annealing  Determines the maximum self end annealing value of all primers  and probes  This determines the number of consecutive base pairs allowed between  the 3    end of one primer and another copy of that primer  This score is calculated in  number of hydrogen bonds  the example below has a score of 4   derived from 2 A T  base pairs each with 2 hydrogen bonds      AATTCCCTACAATCCCCAAA         AAACCCCTAACATCCCTTAA        Secondary structure  Determines the maximum score of the optimal secondary DNA  structure found for a primer or probe  Secondary structures are scored by the number  of hydrogen bonds in the structure  and 2 extra hydrogen bonds are added for each  stacking base pair in the structure     e 3    end G C restrictions  When this checkbox is selected it is possible to specify restrictions  concerning the number of G and C molecules in the 3    end of primers and probes  A low  G C content of the primer probe 3    end increases the specificity of the reaction  A high  G C content facilitates a tight binding of the oligo to the template but also increases the  possibility of mispriming  Unfolding the preference groups yields the following options         End length  The number of consecutive terminal nucleotides for which to consider the  C G cont
299. ely to predict  genes which are not real  Setting a relatively high minimum length of the ORFs will reduce the  number of false positive predictions  but at the same time short genes may be missed  see    figure 14 9    th g  J a    NC 000913 selection    ORF 8000    ORF aax ORF on  e     mp  E     ORF     yaal    Figure 14 9  The first 12 000 positions of the E  coli sequence NC 000913 downloaded from  GenBank  The blue  dark  annotations are the genes while the yellow  brighter  annotations are the  ORFs with a length of at least 100 amino acids  On the positive strand around position 11 000   a gene starts before the ORF  This is due to the use of the standard genetic code rather than the  bacterial code  This particular gene starts with CTG  which is a start codon in bacteria  Two short  genes are entirely missing  while a handful of open reading frames do not correspond to any of the  annotated genes        NC 000913 selection       NC 000913 selection       Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     Finding open reading frames is often a good first step in annotating sequences such as cloning  vectors or bacterial genomes  For eukaryotic genes  ORF determination may not always be very  helpful since the intron exon structure is not part of the algorithm     Chapter 15    Protein analyses    Contents  15 1 Protein CNAE cis eek Bea wae ee OE ee Oe A ew Oe es 237  15 1 1 Modifying the layout   45 oe ee we 2 ee E D
300. ement   Delete  4   or select the element   press Delete key    This will cause the element to be moved to the Recycle Bin  ff   where it is kept until the recycle  bin is emptied  This means that you can recover deleted elements later on     For deleting annotations instead of folders or elements  see section 10 3 4     Restore Deleted Elements    The elements in the Recycle Bin  lj   can be restored by dragging the elements with the mouse  into the folder where they used to be     If you have deleted large amounts of data taking up very much disk space  you can free this disk  space by emptying the Recycle Bin  ff       Edit in the Menu Bar   Empty Recycle Bin  ie     Note  This cannot be undone  and you will therefore not be able to recover the data present in  the recycle bin when it was emptied     3 1 8 Show folder elements in a table    A location or a folder might contain large amounts of elements  It is possible to view their  elements in the View Area     select a folder or location   Show  2    in the Toolbar   Contents  H   An example is shown in figure 3 6     When the elements are shown in the view  they can be sorted by clicking the heading of each  of the columns  You can further refine the sorting by pressing Ctrl    on Mac  while clicking the  heading of another column     CHAPTER 3  USER INTERFACE         Cloning vecto    E    Rows  BO       84       a  a  a         ha      45         Column width    Mame Modified Modifie    Description Length     Linear  m
301. en  B   M  ller  M  B   and Wibling  G   2000    Statistical alignment  computational properties  homology testing and goodness of fit  J Mol  Biol  302 1  265 279      Henikoff and Henikoff  1992  Henikoff  S  and Henikoff  J  G   1992   Amino acid substitution  matrices from protein blocks  Proc Natl Acad Sci US A  89 22  10915 10919      Hopp and Woods  1983  Hopp  T  P  and Woods  K  R   1983   A computer program for predicting  protein antigenic determinants  Mol Immunol  20 4  483 489      Ikai  1980  Ikai  A   1980   Thermostability and aliphatic index of globular proteins  J Biochem   Tokyo   88 6  1895 1898      Janin  1979  Janin  J   1979   Surface and inside volumes in globular proteins  Nature   21   5696  491 492     BIBLIOGRAPHY 402     Jukes and Cantor  1969  Jukes  T  and Cantor  C   1969   Mammalian Protein Metabolism   chapter Evolution of protein molecules  pages 21 32  New York  Academic Press      Karplus and Schulz  1985  Karplus  P  A  and Schulz  G  E   1985   Prediction of chain flexibility  in proteins  Naturwissenschaften   2 212 213      Kimura  1980  Kimura  M   1980   A simple method for estimating evolutionary rates of base  substitutions through comparative studies of nucleotide sequences  J Mol Evol  16 2  111   120      Knudsen and Miyamoto  2001  Knudsen  B  and Miyamoto  M  M   2001   A likelihood ratio  test for evolutionary rate shifts and functional divergence among proteins  Proc Natl Acad Sci  USA  98 25  14512 1451 7      Kolaskar an
302. en this is run on a CLC Server  see http    clcbio com server   all the processes  are placed in the queue  and the queue is then taking care of distributing the jobs  This  means that if the server set up includes multiple nodes  the jobs can be run in parallel     If you need to stop the whole batch run  you need to stop the  master  process     9 1 5 Running de novo assembly and read mapping in batch    De novo assembly and read mapping are special in batch mode because they usually have the  option of assigning individual mapping parameters to each input file  When running in batch  mode this is not possible  Instead  you can change the default parameters used for long and  short reads  respectively  You can also set the paired distance for paired data     Note that this means that you cannot use a combination of paired end and mate pair data for  batching     Figure 9 4 shows the parameter dialog when running read mapping in batch     Note that you can only specify one setting for all short reads  and one setting for all long reads   When the analysis is run  the reads are automatically categorized as either long or short  and the  parameters specified in the dialog are applied  The same goes for all reads that are imported as  paired where the minimum and maximum distances are applied     9 2 How to handle results of analyses    This section will explain how results generated from tools in the Toolbox are handled by CLC DNA  Workbench  Note that this also applies to too
303. ence     Click Zoom Out  73  in the toolbar   click in the view until you reach a satisfying   zoomlevel    or Press         on your keyboard  The last option for zooming out is only available if you have a mouse with a scroll wheel     or Press and hold Ctrl  38 on Mac    Move the scroll wheel on your mouse backwards    CHAPTER 3  USER INTERFACE 92    When you choose the Zoom Out mode  the mouse pointer changes to a magnifying glass to  reflect the mouse mode     Note  You might have to click in the view before you can use the keyboard or the scroll wheel to  ZOOM     If you want to get a quick overview of a sequence or a tree  use the Fit Width function instead of  the Zoom Out function     If you press Shift while clicking in a View  the zoom function is reversed  Hence  clicking on a  sequence in this way while the Zoom Out mode toolbar item is selected  zooms in instead of  zooming out     3 3 3 Fit Width    The Fit Width   4  function adjusts the content of the View so that both ends of the sequence   alignment  or tree is visible in the View in question   This function does not change the mode of  the mouse pointer      3 3 4 Zoom to 100     The Zoom to 100        function zooms the content of the View so that it is displayed with the  highest degree of detail   This function does not change the mode of the mouse pointer      3 3 5 Move    The Move mode allows you to drag the content of a View  E g  if you are studying a sequence   you can click anywhere in the sequenc
304. ence logo graph       Color  The sequence logo can be displayed in black or Rasmol colors  For protein  alignments  a polarity color scheme is also available  wnere hydrophobic residues  are shown in black color  hydrophilic residues as green  acidic residues as red  and basic residues as blue     19 2 1 Bioinformatics explained  Sequence logo    In the search for homologous sequences  researchers are often interested in conserved  sites residues or positions in a sequence which tend to differ a lot  Most researches use  alignments  see Bioinformatics explained  multiple alignments  for visualization of homology on a  given set of either DNA or protein sequences  In proteins  active sites in a given protein family  are often highly conserved  Thus  in an alignment these positions  which are not necessarily  located in proximity  are fully or nearly fully conserved  On the other hand  antigen binding sites in  the Fap unit of immunoglobulins tend to differ quite a lot  whereas the rest of the protein remains  relatively unchanged     In DNA  promoter sites or other DNA binding sites are highly conserved  see figure 19 8   This is  also the case for repressor sites as seen for the Cro repressor of bacteriophage A     When aligning such sequences  regardless of whether they are highly variable or highly conserved  at specific sites  it is very difficult to generate a consensus sequence which covers the actual  variability of a given position  In order to better understand the in
305. ension cost  349  fraction  354  378  insert  357  open cost  349  Gateway cloning  add attB sites  319  create entry clones  324  create expression clones  326  Gb Division  160   gbk  file format  395  GC content  252  GCG Alignment  file format  394  GCG Sequence  file format  393   gck  file format  395  GCK  Gene Construction Kit file format  393  Gel    INDEX    separate sequences without restriction en   zyme digestion  341  tabular view of fragments  339  Gel electrophoresis  340  380  marker  343  view  341  view preferences  341  when finding restriction sites  338  GenBank  view sequence in  161  file format  393  search  167  377  search sequence in  1 1  tutorial  43  Gene Construction Kit  file format  393  Gene expression analysis  377  Gene finding  234  General preferences  104  General Sequence Analyses  199  Genetic code  reverse translation  245  Getting started tutorial  37   gff  file format  395  Google sequence  1 1  Graph  export data points in csv format  128  Graph Side Panel  381  Graphics  data formats  395  export  124   gzip  file format  395  Gzip  file format  395    Half life  216  Handling of results  136  Header  116  Heat map  377  Help  29  Heterozygotes  discover via secondary peaks   305  Hide show Toolbox  94  High throughput sequencing  376  History  131  export  123  preserve when exporting  132  source elements  132  Homology  pairwise comparison of sequences  in alignments  363  Hydrophobicity  239  378  Bioinformatics explained  2
306. ent        Max no  of G C  The maximum number of G and C nucleotides allowed within the  specified length interval        Min no  of G C  The minimum number of G and C nucleotides required within the  specified length interval    e 5  end G C restrictions  When this checkbox is selected it is possible to specify restrictions  concerning the number of G and C molecules in the 5    end of primers and probes  A high  G C content facilitates a tight binding of the oligo to the template but also increases the  possibility of mis priming  Unfolding the preference groups yields the same options as  described above for the 3  end     e Mode  Specifies the reaction type for which primers are designed       Standard PCR  Used when the objective is to design primers  or primer pairs  for PCR    amplification of a single DNA fragment         Nested PCR  Used when the objective is to design two primer pairs for nested PCR  amplification of a single DNA fragment         Sequencing  Used when the objective is to design primers for DNA sequencing         TaqMan  Used when the objective is to design a primer pair and a probe for TaqMan  quantitative PCR     Each mode is described further below     e Calculate  Pushing this button will activate the algorithm for designing primers    CHAPTER 16  PRIMERS 254    16 3 Graphical display of primer information    The primer information settings are found in the Primer information preference group in the Side  Panel to the right of the view  see figur
307. entical sequences  A long stretch of the query protein  is matched to the database     e 10e 50  lt  E value  lt  10e 10 Closely related sequences  could be a domain match or similar   e 10e 10  lt  E value  lt  1 Could be a true homologue but it is a gray area   e E value  gt  1 Proteins are most likely not related    e E value  gt  10 Hits are most likely junk unless the query sequence is very short     Gap costs    For blastp it is possible to specify gap cost for the chosen substitution matrix  There is only a  limited number of options for these parameters  The open gap cost is the price of introducing  gaps in the alignment  and extension gap cost is the price of every extension past the initial  opening gap  Increasing the gap costs will result in alignments with fewer gaps     CHAPTER 12  BLAST SEARCH 194    Filters    It is possible to set different filter options before running the BLAST search  Low complexity  regions have a very simple composition compared to the rest of the sequence and may result in  problems during the BLAST search  Wootton and Federhen  1993   A low complexity region of a  protein can for example look like this    fftfflllsss     which in this case is a region as part of a signal  peptide  In the output of the BLAST search  low complexity regions will be marked in lowercase  gray characters  default setting   The low complexity region cannot be thought of as a significant  match  thus  disabling the low complexity filter is likely to generate 
308. ents window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     Note  This is not the same as a reverse complement  If you wish to create the reverse  complement  please refer to section 14 3     14 5 Translation of DNA or RNA to protein    In CLC DNA Workbench you can translate a nucleotide sequence into a protein sequence using  the Toolbox tools  Usually  you use the  1 reading frame which means that the translation  starts from the first nucleotide  Stop codons result in an asterisk being inserted in the protein  sequence at the corresponding position  It is possible to translate in any combination of the six  reading frames in one analysis  To translate     select a nucleotide sequence   Toolbox in the Menu Bar   Nucleotide Analyses  4     Translate to Protein  7     or right click a nucleotide sequence   Toolbox   Nucleotide Analyses  5    Translate  to Protein   5     CHAPTER 14  NUCLEOTIDE ANALYSES 233    This opens the dialog displayed in figure 14 5        o     Translate to Protein  ES           1  Select nucleotide     q  sequences Projects  Selected Elements   1      5 CLC_Data Xc  ATP8al mRNA     Example Data  XX ATP8a1 genomic sec  x   A Cloning  H  Primers    7 Protein analyses   gt  Protein orthologs          7 RNA secondary strui    H Sequencing data       Fl  E    DED    xs   fts          4 w          
309. eotides  respectively will not be included     e Maximal difference in melting temperature of primers in a pair   the number of degrees  Celsius that primers in a pair are all allowed to differ     e Max hydrogen bonds between pairs   the maximum number of hydrogen bonds allowed  between the forward and the reverse primer in a primer pair     e Maximum length of amplicon   determines the maximum length of the PCR fragment     CHAPTER 16  PRIMERS 268    The output of the design process is a table of single primers or primer pairs as described for  primer design based on single sequences  These primers are specific to the included sequences  in the alignment according to the criteria defined for specificity  The only novelty in the table  is  that melting temperatures are displayed with both a maximum  a minimum and an average value  to reflect that degenerate primers or primers with mismatches may have heterogeneous behavior  on the different templates in the group of included sequences        Calculation parameters    Chosen parameters  Maximum primer length  Minimum primer length  Maximum G C content  Minimum G C content  Maximum melting temperature  Minimum melting temperature  Maximum self annealing  Maximum self end annealing 8 0  Maximum secondary structure  3 end must meet G C requirements No  5 end must meet G C requirements No  Exclusion parameters    Minimum number of mismatches  15  Minimum number of mismatches in 3  end 05  Length of 3 end  15  Primer combination p
310. epresented  either by a simple sequence or a more advanced regular expression  These advanced search  Capabilities are available for use in both DNA and protein sequences     There are two ways to access this functionality     CHAPTER 13  GENERAL SEQUENCE ANALYSES 222    e When viewing sequences  it is possible to have motifs calculated and shown on the  sequence in a similar way as restriction sites  see section 18 3 1   This approach is called  Dynamic motifs and is an easy way to spot known sequence motifs when working with  sequences for cloning etc     e For more refined and systematic search for motifs can be performed through the Toolbox   This will generate a table and optionally add annotations to the sequences     The two approaches are described below     13 7 1 Dynamic motifs    In the Side Panel of sequence views  there is a group called Motifs  see figure 13 22        Motifs      Show    Found 1 motif    Labels  Nolabels       Include reverse motif   F  Exclude unknown regions  attE1  0      attBz  0    SP6  0    T  tO    cM  1         T3  0     pGEX 5 00    T7 terminator  0     His tag  0     Seleck All      Deselect All      Manage Motifs      Figure 13 22  Dynamic motifs in the Side Panel     The Workbench will look for the listed motifs in the sequence that is open and by clicking the  check box next to the motif it will be shown in the view as illustrated in figure 13 23   EEROR ATARAR IDE T IUGURCENANATCARCEUBAL TCE    440 460          AAAATGTCGTAACAACTCCG
311. equence in the table sequence list is displayed with     e Name     CHAPTER 10  VIEWING AND EDITING SEQUENCES 165    e Accession     Description     Modification date     e Length   The number of sequences in the list is reported as the number of Rows at the top of the table  view   Learn more about tables in section C     Adding and removing sequences from the list is easy  adding is done by dragging the sequence  from another list or from the Navigation Area and drop it in the table  To delete sequences   simply select them and press Delete  4      You can also create a subset of the sequence list   select the relevant sequences   right click   Create New Sequence List    This will create a new sequence list which only includes the selected sequences     10 7 3 Extract sequences    It is possible to extract individual sequences from a sequence list in two ways  If the sequence  list is Opened in the tabular view  it is possible to drag  with the mouse  one or more sequences  into the Navigation Area  This allows you to extract specific sequences from the entire list   Another option is to extract all sequences found in the list  This can also be done for     e Alignments  EE   e Contigs and read mappings    gt      Read mapping tables  f      BLAST result      e BLAST overview tables      RNA Seq samples   2     and of course sequence lists       For mappings and BLAST results  the main sequences  i e  reference consensus and query  sequence  will not be extracted    To ext
312. equences by  Name        This opens a dialog where you can add the sequences you wish to sort  You can also add  sequence lists or the contents of an entire folder by right clicking the folder and choose  Add  folder contents     When you click Next  you will be able to specify the details of how the grouping should be  performed  First  you have to choose how each part of the name should be identified  There are  three options     e Simple  This will simply use a designated character to split up the name  You can choose  a character from the list       Underscore _       Dash         Hash  number sign   pound sign          Pipe         Tilde       Dot     e Positions  You can define a part of the name by entering the start and end positions  e g   from character number 6 to 14  For this to work  the names have to be of equal lengths     e Java regular expression  This is an option for advanced users where you can use a special  syntax to have total control over the splitting  See more below     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 281    In the example above  it would be sufficient to use a simple split with the underscore _ character   since this is how the different parts of the name are divided     When you have chosen a way to divide the name  the parts of the name will be listed in the table  at the bottom of the dialog  There is a checkbox next to each part of the name  This checkbox is  used to specify which of the name parts should be used for grouping  
313. equirements    The system requirements of CLC DNA Workbench are these     e Windows XP  Windows Vista  or Windows 7  Windows Server 2003 or Windows Server 2008  e Mac OS X 10 5 or newer  PowerPC G4  G5 or Intel CPU required    e Linux  RedHat 5 or later  SUSE 10 or later    e 32 or 64 bit   e 256 MB RAM required    e 512 MB RAM recommended    1024 x  68 display recommended    1 4 Licenses    When you have installed CLC DNA Workbench  and start for the first time  you will meet the  license assistant  shown in figure 1 1     The following options are available  They will be described in detail in the following sections     e Request an evaluation license  The license is a fully functional  time limited license  see  below      e Download a license  When you purchase a license  you will get a license ID from CLC bio   Using this option  you will get a license based on this ID     e Import a license from a file  If CLC bio has provided a license file  or if you have downloaded  a license from our web based licensing system  you can import it using this option     e Upgrade license  If you already have used a previous version of CLC DNA Workbench  and  you are entitled to upgrading to the new CLC DNA Workbench 6 6  select this option to get  a license upgrade     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 16         License Wizard al    d CLC DNA Workbench       You need a license       In order to use this application you need a valid license   Please choose how you would like
314. er you have changed the preference  you have to re open your tables to see the  effect     5 2 2 Import and export Side Panel settings    If you have created a special set of settings in the Side Panel that you wish to share with other  CLC users  you can export the settings in a file  The other user can then import the settings     To export the Side Panel settings  first select the views that you wish to export settings for  Use  Ctri click       click on Mac  or Shift click to select multiple views  Next click the Export   button   Note that there is also another export button at the very bottom of the dialog  but this will export  the other settings of the Preferences dialog  see section 5 5      A dialog will be shown  see figure 5 6  that allows you to select which of the settings you wish to  export     When multiple views are selected for export  all the view settings for the views will be shown  in the dialog  Click Export and you will now be able to define a save folder and name for the  exported file  The settings are saved in a file with a  vsf extension  View Settings File      To import a Side Panel settings file  make sure you are at the bottom of the View panel of the    CHAPTER 5  USER PREFERENCES AND SETTINGS 108           x  q Select Settings To Export         Non compact               4  No annotations            No restriction sites                XX Cancel      Figure 5 6  Exporting all settings for circular views        Preferences dialog  and click the
315. erent Side Panel settings that  are saved for each view  See section 5 6 for more about how to create and save style sheets     lf there are other settings beside CLC Standard Settings  you can use this overview to choose  which of the settings should be used per default when you open a view  see an example in  figure 5 4      In this example  the CLC Standard Settings is chosen as default     CHAPTER 5  USER PREFERENCES AND SETTINGS 107             EB Preferences ees       User Defined view Settings     Available Editors     amp  30 Molecule  IEE Alignment       General           8 BLAST Graphics  ES BLAST Table           CLC Standard Settings        Non com pact       5 Motif List editor are e  E Multi BLAST Table      Read Mappin No restriction sites          BE Scatter Plot            Advanced    amp  Search Parame ters        Aer Sequence     Small RNA sample  ES Table   FEB Table   Te  Tree   Export        Import             Help     XX Cancel     Export     Import                                 Figure 5 4  Selecting the default view setting     5 2 1 Number formatting in tables    In the preferences  you can specify how the numbers should be formatted in tables  see  figure 5 5         PN E PALHA TORA TCA A MS MO CALA  Number of fraction digits  2   12 35   1 43   0 12   Examples  0 01   1 23E 3   1 23E 4   1 23E 5    Figure 5 5  Number formatting of tables     The examples below the text field are updated when you change the value so that you can see  the effect  Aft
316. eria concern single primers  aS primer pairs are not generated until the Calculate  button is pressed  Parameters regarding primer and probe sets are described in detail for each  reaction mode  see below      e Length  Determines the length interval within which primers can be designed by setting a  maximum and a minimum length  The upper and lower lengths allowed by the program are  50 and 10 nucleotides respectively     e Melting temperature  Determines the temperature interval within which primers must lie   When the Nested PCR or TaqMan reaction type is chosen  the first pair of melting tempera   ture interval settings relate to the outer primer pair i e  not the probe  Melting temperatures  are calculated by a nearest neighbor model which considers stacking interactions between  neighboring bases in the primer template complex  The model uses state of the art thermo   dynamic parameters  SantaLucia  1998  and considers the important contribution from the  dangling ends that are present when a short primer anneals to a template sequence  Bom   marito et al   2000   A number of parameters can be adjusted concerning the reaction  mixture and which influence melting temperatures  See below   Melting temperatures are  corrected for the presence of monovalent cations using the model of  SantaLucia  1998   and temperatures are further corrected for the presence of magnesium  deoxynucleotide  triphosphates  dNTP  and dimethyl sulfoxide  DMSO  using the model of  von Ahsen et al
317. ers that are general for  all primers in an alignment  simply add them all to the set of included sequences by checking all  selection boxes  Specificity of priming is determined by criteria set by the user in the dialog box  which is shown when the Calculate button is pressed  see below      Different options can be chosen concerning the match of the primer to the template sequences  in the included group     CHAPTER 16  PRIMERS 267    e Perfect match  Specifies that the designed primers must have a perfect match to all  relevant sequences in the alignment  When selected  primers will thus only be located  in regions that are completely conserved within the sequences belonging to the included    group     e Allow degeneracy  Designs primers that may include ambiguity characters where hetero   geneities occur in the included template sequences  The allowed fold of degeneracy is  user defined and corresponds to the number of possible primer combinations formed by  a degenerate primer  Thus  if a primer covers two 4 fold degenerate site and one 2 fold  degenerate site the total fold of degeneracy is 4 x 4 x 2   32 and the primer will  when  supplied from the manufacturer  consist of a mixture of 32 different oligonucleotides  When  scoring the available primers  degenerate primers are given a score which decreases with  the fold of degeneracy     e Allow mismatches  Designs primers which are allowed a specified number of mismatches  to the included template sequences  The melti
318. es  R  Y  etc    The contig will display an ambiguity nucleotide  reflecting the different nucleotides found in the reads  For an overview of ambiguity  codes  see Appendix        Vote  A  C  G  T   The conflict will be solved by counting instances of each  nucleotide and then letting the majority decide the nucleotide in the contig  In  case of equality  ACGT are given priority over one another in the stated order     Note  that conflicts will always be highlighted no matter which of the options you  choose  Furthermore  each conflict will be marked as annotation on the contig  sequence and will be present if the contig sequence is extracted for further analysis   As a result  the details of any experimental heterogeneity can be maintained and used  when the result of single sequence analyzes is interpreted     When the parameters have been adjusted  click Next  to see the dialog shown in figure 17 18       f  g Assemble Sequences to Reference  amp s       1  Select some nucleotide   set algorithm parameters  sequences    2  Set reference parameters  3  Set algorithm parameters    Alignment options  Minimum aligned read length  50    Alignment stringency  Medium w    Trimming options  Use existing trim information     Generally not necessary since a reference sequence is used     Output options    Show tabular view of contigs                   A        Previous     gt  Next   Enh    Xena    Figure 17 18  Different options for the output of the assembly              In this d
319. es alphabetically     Right click the name of a sequence   Sort Sequences Alphabetically    If you change the Sequence name  in the Sequence Layout view preferences   you will have to  ask the program to sort the sequences again     The sequences can also be sorted by similarity  grouping similar sequences together     Right click the name of a sequence   Sort Sequences by Similarity    19 3 6 Delete  rename and add sequences  Sequences can be removed from the alignment by right clicking the label of a sequence   right click label   Delete Sequence    This can be undone by clicking Undo       in the Toolbar     If you wish to delete several Sequences  you can check all the sequences  right click and choose    CHAPTER 19  SEQUENCE ALIGNMENT 359    Delete Marked Sequences  To show the checkboxes  you first have to click the Show Selection  Boxes in the Side Panel     A sequence can also be renamed   right click label   Rename Sequence    This will show a dialog  letting you rename the sequence  This will not affect the sequence that  the alignment is based on     Extra sequences can be added to the alignment by creating a new alignment where you select  the current alignment and the extra sequences  see section 19 1      The same procedure can be used for joining two alignments   19 3 7 Realign selection    If you have created an alignment  it is possible to realign a part of it  leaving the rest of the  alignment unchanged     select a part of the alignment to realign   right 
320. es are pooled before running the analysis  If you want individual  outputs for each sequence  you would need to run the tool five times  or alternatively use the  Batching mode     Batching mode is activated by clicking the Batch checkbox in dialog where the input data is  selected  Batching simply means that each data set is run separately  just as if the tool has  been run manually for each one  For some analyses  this simply means that each input sequence  should be run separately  but in other cases it is desirable to pool sets of files together in one  run  This selection of data for a batch run is defined as a batch unit     When batching is selected  the data to be added is the folder containing the data you want to  batch  The content of the folder is assigned into batch units based on this concept     133    CHAPTER 9  BATCHING AND RESULT HANDLING    134       A  EB Find Binding Sites and Create Fragments    1  Choose where to run       Navigation Area  2  Select nucleotide       le  CLC Data     Example Data      Cloning  3  Cloning vector libr   H206    sequence s  to match  primer against    m  gt     BOC   JOG   NE   RNG   JOG pATI53  XX PATHI   906 p  THIO  98 pATHII  XM paTHZ    J96 PATHS   DOG pBLCAT2       4 TT   r     JOC PBLCATS z           gt  gt     Q   enter search term gt        Batch               Previous      gt  Next                  OURS lI m  Selected Elements  5              RRRRR               Mi3mp8 pucs  Mi3mpa puco  pACYC177  pAcyc1s4  p  M34
321. es available       e To the right  there is a list of the enzymes that will be used     Select enzymes in the left side panel and add them to the right panel by double clicking or clicking  the Add button  E gt    If you e g  wish to use EcoRV and BamHI  select these two enzymes and  add them to the right side panel     If you wish to use all the enzymes in the list   Click in the panel to the left   press Ctrl   A   8   A on Mac    Add    gt      The enzymes can be sorted by clicking the column headings  i e  Name  Overhang  Methylation  or Popularity  This is particularly useful if you wish to use enzymes which produce e g  a 3     overhang  In this case  you can sort the list by clicking the Overhang column heading  and all the  enzymes producing 3    overhangs will be listed together for easy selection     When looking for a specific enzyme  it is easier to use the Filter  If you wish to find e g  Hindlll  sites  simply type Hindlll into the filter  and the list of enzymes will shrink automatically to only  include the Hindlll enzyme  This can also be used to only show enzymes producing e g  a 3     overhang as shown in figure 18 51        The CLC DNA Workbench comes with a standard set of enzymes based on http   www  rebase neb com  You  can customize the enzyme database for your installation  see section E    CHAPTER 18  CLONING AND CUTTING 333    Restriction Site Analysis      Select DNA RNA  sequence s  Enzyme list      Enzymes to be considered ae   i  6   rear di Use
322. es no hits  you will be asked if you wish to search for matches that start with your  search term  If you accept this  an asterisk     will be appended to the search term     Pressing the Alt key while you click a search result will high light the search hit in its folder in the  Navigation Area     CHAPTER 4  SEARCHING YOUR DATA 100    Fe GA lol td              CLC_Data  se Example Data    6 Fa  2 Fo Nucleotide    a  Protein   E  README  ee  Recycle bin  14           Figure 4 3  Page two of the search results     In the preferences  see 5   you can specify the number of hits to be shown     4 2 2 Special search expressions    When you write a search term in the search field  you can get help to write a more advanced  search expression by pressing Shift F1  This will reveal a list of guides as shown in figure 4 4     Wildcard search       Search related words       Include both terms  AMD     Include either term  OR     Any field search  contents    Name search  name    Length search  length   START TO ENC J     Organism search  organism      Figure 4 4  Guides to help create advanced search expressions     You can select any of the guides  using mouse or keyboard arrows   and start typing  If you e g   wish to search for sequences named BRCA1  select  Name search  name     and type  BRCA1    Your search expression will now look like this   name BRCA1      The guides available are these     e Wildcard search      Appending an asterisk   to the search term will find matches st
323. es of same type Projects  Selected Elements   1      CLC_Data XC  ATP8al mRNA  B b Example Data  7c ATP8al genomic  xx  Shs ATP8al  H H Cloning  H  Primers     Protein analyses  H H Protein ortholog      RNA secondary         __        Sequencing data  gt     4 p p  Qy    lt enter search term gt    4       Previous    gt  Next Finish X Cancel          Figure 13 1  Choosing sequence for shuffling     If a sequence was selected before choosing the Toolbox action  this sequence is now listed in    the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     Click Next to determine how the shuffling should be performed     In this step  shown in figure 13 2  For nucleotides  the following parameters can be set     r  g Shuffle Sequence ls       1  Select one or more  sequences of same type       2  Set parameters    Resampling methods     Mononucleotide shuffling  Mononucleotide sampling from zero order Markov chain  Dinucleotide shuffling    Dinucleotide sampling From first order Markov chain    Number of sequences  10                CJS Ce Revs   nee  ma    Xena              Figure 13 2  Parameters for shuffling     e Mononucleotide shuffling  Shuffle method generating a sequence of the exact same  mononucleotide frequency    e Dinucleotide shuffling  Shuffle method generating a sequence of the exact same dinu   cleotide frequency    e Mononucleotide sampling from zero order Markov chain  Resampling meth
324. esigning primers     2   1 Specifying a region for the forward primer    First zoom out to get an overview of the sequence by clicking Fit Width   a   You can now see  the blue gene annotation labeled AtpSa1  and just before that there is the green CMV promoter   This may be hidden behind restriction site annotations  Remember that you can always choose  not to Show these by altering the settings in the right hand pane     CHAPTER 2  TUTORIALS 58    In this tutorial  we want the forward primer to be in a region between positions 600 and 900    just before the gene  you may have to zoom in  40  to make the selection   Select this region   right click and choose  Forward primer region here          see figure 2 34         ACGTATTAGTCATCACTATTACPATARTARATANrARTTTTAGCAGTA    Forward primer region here  fe Reverse primer region here   gt  Forward inner primer region here  44 Reverse inner primer region here     E Region to amplify  Pa TaqMan primer probe region    a  No primers here    Figure 2 34  Right clicking a selection and choosing  Forward primer region here      This will add an annotation to this region  and five rows of red and green dots are seen below as  shown in figure 2 35        CMV promoter Forward primer region           pcDNA3 atp8a1l ACGTATTAGTCAT CGCTAT TACCAT GGT GATGCGCGTTTTGGCAGTAC       Lgt  18 000000000000000000000000000000000004  Lgt  19 0000000000000000000000000000000 00004  Lgt  20 000000000000000000000000000000000 004  Lgt  21 000000000000000000000000
325. et the minimum and maximum sizes of the fragments to be  shown  The table is described in detail below     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     An example of a binding site annotation is shown in figure 16 19            primer 6  mismatches  1             o o jm    p   x  Primer binding site  primer6  mismatches  11    inote Primer  GCTaGACCGACAATTGOCATGA  fnote Number of mismatches  1   fnote Number of other hits on peONAS atpeat  O  fnote Primer binding region  151  171    GGCAAGGCTTG GCTTAGGGT    Figure 16 19  Annotation showing a primer match     The annotation has the following information   e Sequence of the primer  Positions with mismatches will be in lower case  see the fourth  position in figure 16 19 where the primer has an a and the template sequence has a T    e Number of mismatches     e Number of other hits on the same sequence  This number can be useful to check specificity  of the primer     e Binding region  This region ends with the 3    exact match and is simply the primer length  upstream  This means that if you have 5    extensions to the primer  part of the binding  region covers sequence that will actually not be annealed to the primer     CHAPTER 16  PRIMERS       pcDNAJ atp  al          Rows   amp     Primer name     5   HindIII    Binding sites    Orientation    Fwd  Fid  Fwd  rey  rey  rey  rey    274    Region    Weng  1575  1594  3316  3337      Column width    Mismatches Number of 
326. ew by zooming in or out  Click Zoom in  540  or Zoom out      in the Toolbar and click the view     Finally  you can modify the format of the text heading each lane in the Text format preferences  in the Side Panel     18 5 Restriction enzyme lists    CLC DNA Workbench includes all the restriction enzymes available in the REBASE database      However  when performing restriction site analyses  it is often an advantage to use a customized  list of enzymes  In this case  the user can create special lists containing e g  all enzymes  available in the laboratory freezer  all enzymes used to create a given restriction map or all  enzymes that are available form the preferred vendor     In the example data  see section 1 6 2  under Nucleotide  gt Restriction analysis  there are two  enzyme lists  one with the 50 most popular enzymes  and another with all enzymes that are  included in the CLC DNA Workbench     This section describes how you can create an enzyme list  and how you can modify it     18 5 1 Create enzyme list    CLC DNA Workbench uses enzymes from the REBASE restriction enzyme database at http       rebase neb  com       To create an enzyme list of a subset of these enzymes        You can customize the enzyme database for your installation  see section E  SYou can customize the enzyme database for your installation  see section E    CHAPTER 18  CLONING AND CUTTING 344    File   New   Enzyme list  5    E   This opens the dialog shown in figure 18 50          a     Create
327. f annotations    Sometimes you end up with annotations which do not have a meaningful name  In that case  there is an advanced batch rename functionality     Open the Annotation Table  E    select the annotations that you want to rename    right click the selection   Advanced Rename    This will bring up the dialog shown in figure 10 11   EI Rename    o Use this qualifier  iF exists     organism    Lise annotation type as name    Ken     Figure 10 11  The Advanced Rename dialog        In this dialog  you have two options     e Use this qualifier  Use one of the qualifiers as name  A list of all qualifiers of all the  selected annotations is shown  Note that if one of the annotations do not have the qualifier  you have chosen  it will not be renamed  If an annotation has multiple qualifiers of the  same type  the first is used for naming     e Use annotation type as name  The annotation   s type will be used as name  e g  if you have  an annotation of type  Promoter   it will get  Promoter  as its name by using this option      A similar functionality is available for batch re typing annotations is available in the right click  menu as well  in case your annotations are not typed correctly     Open the Annotation Table  E    select the annotations that you want to retype    right click the selection   Advanced Retype    This will bring up the dialog shown in figure 10 12     In this dialog  you have two options     e Use this qualifier  Use one of the qualifiers as type  A lis
328. f broken pairs  298  Data  storage location   8  Data formats  bioinformatic  392  graphics  395  Data preferences  108  Data sharing   8  Data structure  78  Database  GenBank  167  local  78    407    NCBI  186  nucleotide  386  peptide  386  Shared BLAST database  186  Db source  160  db_xref references  1 2  de multiplexing  279  Delete  element  83  residues and gaps in alignment  357  workspace  95  Description  160  batch edit  84  DGE  377  Digital gene expression  3    DIP detection  3 6  Dipeptide distribution  218  Discovery studio  file format  393  Distance  pairwise comparison of Sequences in  alignments  363  DNA translation  232  DNAstrider  file format  393  Dot plots  3 9  Bioinformatics explained  204  create  201  print  203  Double cutters  329  Double stranded DNA  142  Download and open  search results  GenBank  1 0  Download and save  search results  GenBank  1 0  Download of CLC DNA Workbench  12  Drag and drop  Navigation Area  80  search results  GenBank  169  DS Gene  file format  393    E PCR  271   Edit  alignments  35 7  3 8  annotations  156  158  377  enzymes  330  sequence  149  sequences  37 7  single bases  149   Element    INDEX  delete  83  rename  83     embl  file format  395  Embl  file format  393  Encapsulated PostScript  export  126  End gap cost  349  End gap costs  cheap end caps  349  free end gaps  349  Entry clone  creating  324  Enzyme list  343  create  343  edit  345  view  345   eps format  export  126  Error reports  28  E
329. f the plot can be set by clicking    the color box  For Colors  the color box is replaced by a gradient color box as  described under Foreground color     Protein info    These preferences only apply to proteins  The first nine items are different hydrophobicity scales  and are described in section 15 2 2     e Kyte Doolittle  The Kyte Doolittle scale is widely used for detecting hydrophobic regions  in proteins  Regions with a positive value are hydrophobic  This scale can be used for  identifying both surface exposed regions as well as transmembrane regions  depending  on the window size used  Short window sizes of 5   generally work well for predicting  putative surface exposed regions  Large window sizes of 19 21 are well suited for finding  transmembrane domains if the values calculated are above 1 6  Kyte and Doolittle  1982    These values should be used as a rule of thumb and deviations from the rule may occur     e Cornette  Cornette et al  computed an optimal hydrophobicity scale based on 28 published  scales  Cornette et al   1987   This optimized scale is also suitable for prediction of  alpha helices in proteins     e Engelman  The Engelman hydrophobicity scale  also known as the GES scale  is another  scale which can be used for prediction of protein hydrophobicity  Engelman et al   1986    As the Kyte Doolittle scale  this scale is useful for predicting transmembrane regions in  proteins     e Eisenberg  The Eisenberg scale is a normalized consensus hydrophobic
330. f type    All Files      Options     Automatic import  Force import as type    ACE files  ace     Force import as external file s           Figure 7 1  The import dialog     Next  select one or more files or folders to import and click Select   This allows you to select a place for saving the result files     If you import one or more folders  the contents of the folder is automatically imported and placed  in that folder in the Navigation Area  If the folder contains subfolders  the whole folder structure  iS imported     In the import dialog  figure 7 1   there are three import options     Automatic import This will import the file and CLC DNA Workbench will try to determine the  format of the file  The format is determined based on the file extension  e g  SwissProt  files have  swp at the end of the file name  in combination with a detection of elements in  the file that are specific to the individual file formats  If the file type is not recognized  it  will be imported as an external file  In most cases  automatic import will yield a successful  result  but if the import goes wrong  the next option can be helpful     CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 119    Force import as type This option should be used if CLC DNA Workbench cannot successfully  determine the file format  By forcing the import as a specific type  the automatic  determination of the file format is bypassed  and the file is imported as the type specified     Force import as external file This 
331. f you select Print whole view  you will get a result that looks like figure 6 4   This means that you also print the part of the sequence which is not visible when you have  zoomed in     CHAPTER 6  PRINTING 115             Figure 6 4  A print of the sequence selecting Print whole view  The whole sequence is shown  even  though the view is zoomed in on a part of the sequence     6 2 Page setup    No matter whether you have chosen to print the visible area or the whole view  you can adjust  page setup of the print  An example of this can be seen in figure 6 5       EB Page Setup    o Portrait Landscape  Paper Size A4 X  Fit to pages  Horizontal pages     Vertical pages        wf OK   X Cancel      Help               Figure 6 5  Page Setup     In this dialog you can adjust both the setup of the pages and specify a header and a footer by  clicking the tab at the top of the dialog     You can modify the layout of the page using the following options     e Orientation         Portrait  Will print with the paper oriented vertically       Landscape  Will print with the paper oriented horizontally     e Paper size  Adjust the size to match the paper in your printer     e Fit to pages  Can be used to control how the graphics should be split across pages  see  figure 6 6 for an example          Horizontal pages  If you set the value to e g  2  the printed content will be broken  up horizontally and split across 2 pages  This is useful for Sequences that are not  wrapped        Vertical
332. fgdaikn   P68945 EWAWHaeeKaNitaolWokWnVadeGgaealarlSSSSViVVOWEGFEFsSFGnVSsptailoApmM  PARGKKVEt sfgdavk n   P6a063 EWAWHaeeKalitalWa FEE EEE EEE EET n   NP 032247 mynftaeektlinglwskunveevagealori ESSivvypwthrffosfonissasaimonprukahokkvltafgesiknl  CAA32220 myhftacekaaitsiwdkvdlekvogetloriEssanivypwtarffokfontssagaimonprikahokkvEtsglavkni  CAA24102 HEREN   FER  ESE BRSEEEE CRRCEREEEERNNNEEEEEEEEEEA   ERRERA   E E  HEE ni  P04443 GUNFLoeekoBitsiWGkWAI ckVGGeRNGFISEEENiWYOWLGREToKTonissocainonprikah RRVEtsiglavkn   Q6WN28 muhltoceksavtalwokvynvdevogealori Essitvvypwtartfesfodistodavmnnpkvkahokkvlgafsdalth  Q6WN21 muhltgesksavit AHH CH HHHH HHHH HH EAH et A  P67821 M  APRacekKsavitt lWokWAVdevgGea Gr IESESivvyoWtaGrhfoSfgdist pdavmnApkWKERGKKV gafsdgith   CAA26204 muhlltpesksavtalwakvynvdevagealorivsriivvypwtartfesfodistpdavmonpkvkahokkvlgafsdalah  P68873 MYANEpSeKsavtalWakWAVdeVG Gea NOR ERRENA Rne SHGdNSt pdavmoAPKYRARGKKVEoafsdalah     Figure 19 16  The tabular format of a multiple alignment of 24 Hemoglobin protein sequences   Sequence names appear at the beginning of each row and the residue position is indicated by  the numbers at the top of the alignment columns  The level of sequence conservation is shown  on a color scale with blue residues being the least conserved and red residues being the most  conserved     It is therefore commonplace to either ignore this complication and assume sequences to be  unrelated  or to use heuristic corrections for shared ancestry     The
333. figure 17 6  the dialog should include a linker for the Srfl site  a barcode  a sequence  a barcode   now reversed  and finally a linker again as shown in figure 17 8     If you have paired data  the dialog shown in figure 17 8 will be displayed twice   one for each part  of the pair     Clicking Next will display a dialog as shown in figure 17 9     The barcodes can be entered manually by clicking the Add  E  button  You can edit the barcodes  and the names by clicking the cells in the table  The name is used for naming the results     In addition to adding barcodes manually  you can also Import  E  barcode definitions from an  Excel or CSV file  The input format consists of two columns  the first contains the barcode  sequence  the second contains the name of the barcode  An acceptable csv format file would  contain columns of information that looks like      AAAAAA   Sample1    GGGGGG   Sample2    CCCCCC   Sample3     The Preview column will show a preview of the results by running through the first 10 000 reads     At the top  you can choose to search on both strands for the barcodes  this is needed for some  454 protocols where the MID is located at either end of the read      CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 285    q Process Tagged Sequences    1  Choose where to run   Net parameters    Tag list  2  Select nucleotide    sequences 1 Linker    Linker length  4 nucleotides    gt  Barcode    Barcodes  length  6  Define barcodes in next step   Sequence   Seq
334. formation content or significance  of certain positions  a sequence logo can be used  The sequence logo displays the information  content of all positions in an alignment as residues or nucleotides stacked on top of each other   see figure 19 8   The sequence logo provides a far more detailed view of the entire alignment  than a simple consensus sequence  Sequence logos can aid to identify protein binding sites on  DNA sequences and can also aid to identify conserved residues in aligned domains of protein  sequences and a wide range of other applications     Each position of the alignment and consequently the sequence logo shows the sequence  information in a computed score based on Shannon entropy  Schneider and Stephens  1990    The height of the individual letters represent the sequence information content in that particular  position of the alignment     A sequence logo is a much better visualization tool than a simple consensus sequence  An  example hereof is an alignment where in one position a particular residue is found in 70  of the  sequences  If a consensus sequence is used  it typically only displays the single residue with  TO  coverage  In figure 19 8 an un gapped alignment of 11 E  coli start codons including flanking  regions are shown  In this example  a consensus sequence would only display ATG as the start  codon in position 1  but when looking at the sequence logo it is seen that a GTG is also allowed  as a Start codon     CHAPTER 19  SEQUENCE ALIGNMENT 35
335. gle         Lower comparison Selects the comparison to show in the lower triangle  Choose the    same comparison as in the upper triangle to show all the results of an asymmetric  comparison         Lower comparison gradient  Selects the color gradient to use for the lower triangle         Diagonal from upper  Use this setting to show the diagonal results from the upper  comparison         Diagonal from lower  Use this setting to show the diagonal results from the lower  comparison         No Diagonal  Leaves the diagonal table entries blank     e Layout        Lock headers  Locks the sequence labels and table headers when scrolling the table         Sequence label  Changes the sequence labels        CHAPTER 19  SEQUENCE ALIGNMENT 364    e Text format        Text size  Changes the size of the table and the text within it       Font  Changes the font in the table       Bold  Toggles the use of boldface in the table     19 6 Bioinformatics explained  Multiple alignments    Multiple alignments are at the core of bioinformatical analysis  Often the first step in a chain of  bioinformatical analyses is to construct a multiple alignment of a number of homologs DNA or  protein sequences  However  despite their frequent use  the development of multiple alignment  algorithms remains one of the algorithmically most challenging areas in bioinformatical research     Constructing a multiple alignment corresponds to developing a hypothesis of how a number of  sequences have evolved through
336. gment with additional sequences by extending  the primers 5    of the template specific part of the primer  i e  between the template specific part  and the attB sites   See an example of this in figure 18 21 where a Shine Dalgarno site has been  added between the attB site and the gene of interest     At the top of the dialog  see figure 18 16   you can specify primer additions such as a Shine   Dalgarno site  start codon etc  Click in the text field and press Shift   F1 to show some of the  most common additions  see figure 18 17      Use the up and down arrow keys to select a tag and press Enter  This will insert the selected  sequence as shown in figure 18 18     At the bottom of the dialog  you can see a preview of what the final PCR product will look like   In the middle there is the sequence of interest  i e  the sequence you selected as input   In the  beginning is the attB1 site  and at the end is the attB2 site  The primer additions that you have  inserted are shown in colors  like the green Shine Dalgarno site in figure 18 18      This default list of primer additions can be modified  see section 18 2 1     CHAPTER 18  CLONING AND CUTTING 321    Add attB Sites    1  Select nucleotide  sequences Insets    2  Specify auxiliary insets Forward insets    Press Shift   F1 For options  Reverse insets    Press Shift   F1 For options    Preview    GGGG ACAAGTTTGTACAAAAAAGCAGGCTTA    Sequence of interest    AACCCAGCTTTCTTGTACAAAGTGGT CCCC  attB2    Figure 18 16  Primer addit
337. ground distribution of amino acids from a range of organisms     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   This will open a view showing the patterns found as annotations on the original sequence  see  figure 13 21   If you have selected several Sequences  a corresponding number of views will be  opened     Pattern1 Pattern1  L    sVCNKNGQTA EDLAWSYGFP ECARFLTMIK CMQTARSSGE    Figure 13 21  Sequence view displaying two discovered patterns     13 6 2 Pattern search output    If the analysis is performed on several sequences at a time the method will search for patterns  in the sequences and open a new view for each of the sequences  in which a pattern was  discovered  Each novel pattern will be represented as an annotation of the type Region  More  information on each found pattern is available through the tool tip  including detailed information  on the position of the pattern and quality scores     It is also possible to get a tabular view of all found patterns in one combined table  Then each  found pattern will be represented with various information on obtained scores  quality of the  pattern and position in the sequence     A table view of emission values of the actual used HMM model is presented in a table view  This  model can be saved and used to search for a similar pattern in new or unknown sequences     13 7 Motif Search    CLC DNA Workbench offers advanced and versatile options to search for known motifs r
338. h of the four nucleotides the trace data can be selected and  unselected     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 219    e Scale traces  A slider which allows the user to scale the height of the trace area  Scaling  the traces individually is described in section 17 1 1     acer readi       20 A sequence Settings  4            gt   readi CTGGGCATCGACTGAGACACGCTGTGGATATATG   riscas      Nucleotide info       J        b Translation  w Trace data  Show  A trace  C trace  G trace  T trace   C  Show confidence    Trace data    read     Trace data          readi CTTCAGCTTTGGTGGGTTTACATTTAAAAGAACA E    Trace height medium v    Trace data Scaling  drag trace data in view    120  gt  GIC content     gt  Secondary structure    readi AGCGGGTCATCAGTCAAAAAAGAGGAAGAAGTGC b Find J  v    HosstoryR 2 E    Figure 17 3  A sequence with trace data  The preferences for viewing the trace are shown in the  Side Panel     17 2 Multiplexing    When you do batch sequencing of different samples  you can use multiplexing techniques to  run different samples in the same run  There is often a data analysis challenge to separate  the sequencing reads  so that the reads from one sample are mapped together  The CLC DNA  Workbench supports automatic grouping of samples for two multiplexing techniques     e By name  This supports grouping of reads based on their name   e By sequence tag  This supports grouping of reads based on information within the    sequence  tagged sequences      The details 
339. he  Frame box   The result is shown in figure 2 20    You can see that the variation is on the third base of the codon coding for threonine  so this    iS a synonymous substitution  That is why the T is colored yellow  If it was a non synonymous  substitution  it would be colored in red     CHAPTER 2  TUTORIALS 50         ContigSettinas x    ha        4 iE         AAGTCAAAGTCATCACAICT TGCCATCGGGGATC  sembly layout  Q V K V   T L A   G D  gt  Sequence layout   gt  Annotation layout  Conflict  gt  Annotation types   gt  Residue coloring    4    AAGTCAAAGTCATCACECTTGCCATCGGGGATC  gt  Alignment info          Q V K V l T i L A l G E v Nucleotide info  E  gt  Color space encoding  E    Translation  Show  Frame  ORFICDS vi  Table  AAGTCAAAGTCATCACGCTTGCCATCGGGGATC    Standard K  Q V K V l T L A   G D Only AUG start codons        Single letter codes  MyWii Vv N   Qualy score    gt  GIC content    gt  Secondary structure     gt  Find        gt  Text Format    AAGTCAAAGTCATCACGICTTGCCATCGGGGATC    e    Figure 2 20  Showing the translation along the contig     VA       2 5 8 Getting an overview of the conflicts    Browsing the conflicts by clicking the Find Conflict button is useful in many cases  but you might  also want to get an overview of all the conflicts in the entire contig  This is easily achieved by  Showing the contig in a table view        Press and hold the Ctrl button  38 on Mac    Click Show Table  H8  at the bottom  of the view    This will open a table showing the confl
340. he  information     Note  CLC files can be exported from and imported into all the different CLC Workbenches     Backup    If you wish to secure your data from computer breakdowns  it is advisable to perform regular  backups of your data  Backing up data in the CLC DNA Workbench is done in two ways     e Making a backup of each of the folders represented by the locations in the Navigation  Area     e Selecting all locations in the Navigation Area and export  E  in  zip format  The resulting  file will contain all the data stored in the Navigation Area and can be imported into CLC  DNA Workbench if you wish to restore from the back up at some point     CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 124    No matter which method is used for backup  you may have to re define the locations in the  Navigation Area if you restore your data from a computer breakdown     1 2 External files    In order to help you organize your research projects  CLC DNA Workbench lets you import all kinds  of files  E g  if you have Word  Excel or pdf files related to your project  you can import them  into the Navigation Area of CLC DNA Workbench  Importing an external file creates a copy of the  file which is stored at the location you have chosen for import  The file can now be opened by  double clicking the file in the Navigation Area  The file is opened using the default application  for this file type  e g  Microsoft Word for  doc files and Adobe Reader for  pdf      External files are imported 
341. he Help menu of the program  This installs the data  automatically  You can also go to http   www clcbio com download and download the  example data from there     lf you download the file from the website  you need to import it into the program  See chapter  7 1 for more about importing data     1 7 Plug ins    When you install CLC DNA Workbench  it has a standard set of features  However  you can  upgrade and customize the program using a variety of plug ins     As the range of plug ins is continuously updated and expanded  they will not be listed here  Instead  we refer to http    www clcbio com plug ins fora full list of plug ins with descriptions of  their functionalities     1 7 1 Installing plug ins  Plug ins are installed using the plug in manager      Help in the Menu Bar   Plug ins and Resources     E   or Plug ins       in the Toolbar    The plug in manager has four tabs at the top     e Manage Plug ins  This is an overview of plug ins that are installed     e Download Plug ins  This is an overview of available plug ins on CLC bio   s server     7In order to install plug ins on Windows Vista  the Workbench must be run in administrator mode  Right click the  program shortcut and choose  Run as Administrator   Then follow the procedure described below     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 31    e Manage Resources  This is an overview of resources that are installed     e Download Resources  This is an overview of available resources on CLC bio   s server  
342. he sequences     AT CATCAAATAGTGTCAA       e Right click the sequence name  to the left  to manipulate the whole sequence     e Right click a selection to manipulate the selection     The two menus are described in the following     Manipulate the whole sequence    Right clicking the sequence name at the left side of the view reveals several options on sorting   opening and editing the sequences in the view  see figure 18 9      CHAPTER 18  CLONING AND CUTTING 314    a a ee a ee a M SS  PBR   O  Open Sequence in Circular View    uence det Duplicate Sequence    mid Reverse Complement Sequence       Digest and Create Restriction Map    Rename Sequence     Select Sequence    Delete Sequence  Open Copy of Sequence in Mew view    Open This Sequence in New View      Make Sequence Linear  Sort Sequence List by Name  Sort Sequence List by Length   o Web Info k    Figure 18 9  Right click on the sequence in the cloning view     e Open sequence in circular view   0    Opens the sequence in a new circular view  If the sequence is not circular  you will be asked  if you wish to make it circular or not   This will not forge ends with matching overhangs  together   use  Make Sequence Circular       instead      e Duplicate sequence  Adds a duplicate of the selected sequence  The new sequence will be added to the list of  sequences shown on the screen     e Insert sequence after this sequence      Insert another sequence after this sequence  The sequence to be inserted can be selected  from
343. he work  You may not use this  work for commercial purposes  You may not alter  transform  nor build upon this work     SOME RIGHTS RESERVED    See http   creativecommons org licenses by nc nd 2 5  for more information on  how to use the contents     Chapter 20    Phylogenetic trees    Contents  20 1 Inferring phylogenetic trees       2    2 ee tee ee ee es 366  20 1 1 Phylogenetic tree parameters           2  0 08 ee eee ee ee 367  20 1 2 Tree View Preferences    osooso ce a a a eee eee ee ee 369  20 2 Bioinformatics explained  phylogenetics         080 8808 ee enue 371  20 2 1 The phylogenetic tree        2    2 oao aoa a a a a a a 371  20 2 2 Modem usage of phylogenies       nononono oa o a 2 0 02 eee ee eee 372  20 2 3 Reconstructing phylogenies from molecular data               372  20 2 4 Interpreting phylogenies           oaoa oaoa a a e a a 374    CLC DNA Workbench offers different ways of inferring phylogenetic trees  The first part of this  chapter will briefly explain the different ways of inferring trees in CLC DNA Workbench  The second  part   Bioinformatics explained   will give a more general introduction to the concept of phylogeny  and the associated bioinformatics methods     20 1 Inferring phylogenetic trees    For a given set of aligned sequences  see chapter 19  it is possible to infer their evolutionary  relationships  In CLC DNA Workbench this may be done either by using a distance based method   see  Bioinformatics explained  in section 20 2   or by us
344. here is no directionality indicated when  setting parameters for melting temperature differences between inner and outer primer  pair  i e  it is not specified whether the inner pair should have a lower or higher Tm  Instead  this is determined by the allowed temperature intervals for inner and outer primers that are  set in the primer parameters preference group in the side panel  If a higher Tm of inner  primers is desired  choose a Tm interval for inner primers which has higher values than the  interval for outer primers     Two radio buttons allowing the user to choose between a fast and an accurate algorithm  for primer prediction     CHAPTER 16  PRIMERS 262    16 6 1 Nested PCR output table    In nested PCR there are four primers in a solution  forward outer primer  FO   forward inner primer   FI   reverse inner primer  RI  and a reverse outer primer  RO      The output table can show primer pair combination parameters for all four combinations of  primers and single primer parameters for all four primers in a solution  See section on Standard  PCR for an explanation of the available primer pair and single primer information      The fragment length in this mode refers to the length of the PCR fragment generated by the inner  primer pair  and this is also the PCR fragment which can be exported     16 7 TaqMan  CLC DNA Workbench allows the user to design primers and probes for TaqMan PCR applications     TaqMan probes are oligonucleotides that contain a fluorescent repor
345. hey will be cut off     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 296       a  EB Add Sequences to Contig        1  Select some nucleotide  s paramete  sequences and one contig       2  Set parameters    Alignment options  Minimum aligned read length   50  Alignment stringency  Medium w    Trimming options  Use existing trim information     Generally not necessary since a reference sequence is used     Output options    Show tabular view of contigs          q        Previous      gt  Next  A Einish   XX Cancel    Figure 17 19  Setting assembly parameters when assembling to an existing contig                       17 7 View and edit contigs    The result of the assembly process is one or more contigs where the sequence reads have been  aligned  see figure 17 20      540 a60         a  Consensus TGAATACTCCAGTACAGAGAGGGTG    radi TGAATACTCCAGITIACAGAGAGGGTG    recat Pala VW    read  TGAATACTCCAGICIACAGAG    tees WWW      read3 TGAATACTCCAGITIACAGAGAGGGTG    fees IN ASA Da IN Nw wy Vy    Figure 17 20  The view of a contig  Notice that you can zoom to a very detailed level in contigs     You can see that color of the residues and trace at the end of one of the reads has been faded   This indicates  that this region has not contributed to the contig  This may be due to trimming  before or during the assembly or due to misalignment to the other reads     You can easily adjust the trimmed area to include more of the read in the contig  simply drag the  edge of the faded area 
346. horizontal axis  x axis   Enter a value in Min    and Max  and press Enter  This will update the view  If you wait a few seconds without  pressing Enter  the view will also be updated     e Vertical axis range  Sets the range of the vertical axis  y axis   Enter a value in Min and  Max  and press Enter  This will update the view  If you wait a few seconds without pressing  Enter  the view will also be updated     e X axis at zero  This will draw the x axis at y   O  Note that the axis range will not be  changed     e Y axis at zero  This will draw the y axis at x   O  Note that the axis range will not be  changed     e Show as histogram  For some data series it is possible to see the graph as a histogram  rather than a line plot     381    APPENDIX B  GRAPH PREFERENCES 382    The Lines and plots below contains the following settings     e Dot type      None      Cross  Plus        Square    Diamond        Circle    Triangle    Reverse triangle      Dot    Dot color  Allows you to choose between many different colors  Click the color box to select  a color     Line width      Thin      Medium      Wide  e Line type      None      Line      Long dash      Short dash    e Line color  Allows you to choose between many different colors  Click the color box to  select a color     For graphs with multiple data series  you can select which curve the dot and line preferences  Should apply to  This setting is at the top of the Side Panel group     Note that the graph title and the axes
347. how to draw the line  no matter what the zoom factor is  thereby always giving a correct image   This format is good for e g  graphs and reports  but less usable for e g  dot plots  If the image is  to be resized or edited  vector graphics are by far the best format to store graphics  If you open  a vector graphics file in an application like e g  Adobe Illustrator  you will be able to manipulate  the image in great detail     Graphics files can also be imported into the Navigation Area  However  no kinds of graphics files  can be displayed in CLC DNA Workbench  See section 7 2 for more about importing external files  into CLC DNA Workbench     7 3 3 Graphics export parameters    When you have specified the name and location to save the graphics file  you can either click  Next or Finish  Clicking Next allows you to set further parameters for the graphics export   whereas clicking Finish will export using the parameters that you have set last time you made a  graphics export in that file format  if it is the first time  it will use default parameters      Parameters for bitmap formats    For bitmap files  clicking Next will display the dialog shown in figure 7 12        f  EB Export Graphics   Es      1  Output options Ma  2  Save in file    3  Export size    Choose resolution     Screen resolution  530x3072 pixels   9 MB memory usage   Low resolution  286x1660 pixels   2 MB memory usage   Medium resolution  1145x6640 pixels   43 MB memory usage     High resolution  4582x2656
348. ialog  you can specify more options     e Minimum aligned read length  The minimum number of nucleotides in a read which must  be successfully aligned to the contig  If this criteria is not met by a read  this is excluded  from the assembly     e Alignment stringency  Specifies the stringency of the scoring function used by the alignment  step in the contig assembly algorithm  A higher stringency level will tend to produce contigs    CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 295    with less ambiguities but will also tend to omit more sequencing reads and to generate  more and shorter contigs  Three stringency levels can be set         Low       Medium       High     e Use existing trim information  When using a reference sequence  trimming is generally  not necessary  but if you wish to use trimming you can check this box  It requires that the  sequence reads have been trimmed beforehand  see section 17 3 for more information  about trimming      e Show tabular view of contigs  A contig can be shown both in a graphical as well as a  tabular view  If you select this option  a tabular view of the contig will also be opened  Even  if you do not select this option  you can show the tabular view of the contig later on by  clicking Show  4   and selecting Table  F5    For more information about the tabular view  of contigs  see section 1             Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   This will start the asse
349. ical level  the CLC DNA Workbench uses the NCBI s blast  software  see ftp     ftp ncbi nlm nih gov blast executables blast  LATEST    Thus  the results  of using a particular data set to search the same database  with the same search parameters   would give the same results  whether run locally or at the NCBI     There are a number of options for what you can search against     e You create a database based on data already imported into your Workbench  see sec   tion 12 3 3     e You can add pre formatted databases  see section 12 3 1     e You can use sequence data from the Navigation Area directly  without creating a database  first     To conduct a BLAST search   or Toolbox   BLAST       Local BLAST    2   This opens the dialog seen in figure 12 5   Select one or more sequences of the same type  DNA or protein  and click Next   This opens the dialog seen in figure 12 6     At the top  you can choose between different BLAST programs  See section 12 1 1 for information  about these methods     You then specify the target database to use     CHAPTER 12  BLAST SEARCH 179    e Local BLAST          1  Select sequences of same peido tek ee                      Navigation Area Selected Elements  1         Ga CLC Data ss  ATPSal     ka Example Data  su           gt   5 Protein orthologs   x ATP8al MRNA      Protein analyses   B Cloning   XX ATP8al genomic sequence  tj Sequencing data   F   Primers   Eq RNA secondary structure          a E e R A                            R  za    
350. icts  You can right click the Note field and enter your own  comment  In this dialog  enter a new text in the Name and click OK     When you edit a comment  this is reflected in the conflict annotation on the consensus sequence   This means that when you use this sequence later on  you will easily be able to see the comments  you have entered  The comment could be e g  your interpretation of the conflict     2 5 9 Documenting your changes    Whenever you make a change like deleting a  T   it will be noted in the contig s history  To open  the history  click the fHistory  Li  icon at the bottom of the view     In the history  you can see the details of each change  see figure 2 21      2 5 10 Using the result for further analyses    When you have finished editing the contig  it can be saved  and you can also extract and save  the consensus sequence     CHAPTER 2  TUTORIALS o1    Ch Reference contig C gt       NO LLU FL           User  smoensted  Parameters   Read name   Fide  Old aligned region  New aligned region       139   955  37   J05    Ill     Comments  Edit    Wo Comment       User  smoensted  Parameters    Region   977   Modified element   Revs  Comments  Edit    Wo Comment             Figure 2 21  The history of the contig showing that a  T  has been deleted and that the aligned  region has been moved     Right click the name  Consensus    Open Copy of Sequence   Save  HD    This will make it possible to use this sequence for further analyses in the CLC DNA Workbench
351. idues and gaps       2    ee a a a a a Sof  Poe NCCU BCS oo wee eh owe Oe ee ee we eee Bo eee 357  19 3 3 Delete residues and gaps        1    aoao a a a a o a a 357  19 3 4 Copy annotations to other sequences    aooaa a a 358  19 3 5 Move sequences up and down         a a a a a ee es 358  19 3 6 Delete  rename and add sequences            0 800 ee ee eee 358  19 3 7 Realign selection   inde ktathadeevRiett bee dut ae a ot 359  19 4 Join alignments 462 ce Be eee eR EES RELEASES a 6 E 359  19 4 1 How alignments are joined        2    ee ee 361  19 5 Pairwise comparison        0 0 ee ee ee 361  19 5 1 Pairwise comparison on alignment selection                  361  19 5 2 Pairwise comparison parameters        0  a a a ee a a 362  19 5 3 The pairwise comparison table         a a a a a ee ee ee a 363  19 6 Bioinformatics explained  Multiple alignments            2 22080808  364  19 6 1 Use of multiple alignments                0 200500 eee 364  19 6 2 Constructing multiple alignments                2 0 50208  364    CLC DNA Workbench can align nucleotides and proteins using a progressive alignment algorithm   see section 19 6 or read the White paper on alignments in the Science section of http     www clcbio com     This chapter describes how to use the program to align sequences  The chapter also describes  alignment algorithms in more general terms     347    CHAPTER 19  SEQUENCE ALIGNMENT 348    19 1 Create an alignment    Alignments can be created from sequences  sequence lis
352. ight of the graph   x Type  The type of the graph     Line plot  Displays the graph as a line plot     Bar plot  Displays the graph as a line plot     Colors  Displays the graph as a color bar using a gradient like the foreground  and background colors   x Color box  Specifies the color of the graph for line and bar plots  and specifies a  gradient for colors     e Color different residues  Indicates differences in aligned residues         Foreground color  Colors the letter       Background color  Sets a background color of the residues     e Sequence logo  A sequence logo displays the frequencies of residues at each position  in an alignment  This is presented as the relative heights of letters  along with the degree  of sequence conservation as the total height of a stack of letters  measured in bits of  information  The vertical scale is in bits  with a maximum of 2 bits for nucleotides and  approximately 4 32 bits for amino acid residues  See section 19 2 1 for more details     CHAPTER 19  SEQUENCE ALIGNMENT 355        Foreground color  Color the residues using a gradient according to the information  content of the alignment column  Low values indicate columns with high variability  whereas high values indicate columns with similar residues         Background color  Sets a background color of the residues using a gradient in the  same way as described above         Logo  Displays sequence logo at the bottom of the alignment     x Height  Specifies the height of the sequ
353. ill have entries named  Element deleted   An easy way to  export an element with all its source elements is to use the Export Dependent Elements function  described in section 7 1 3     The history view can be printed  To do so  click the Print icon  55   The history can also be  exported as a pdf file     Select the element in the Navigation Area   Export  ES    in  File of type    choose  History PDF   Save    Chapter 9    Batching and result handling    Contents  9 1 Batch processing      tea tet ee eb ee ns A ew Oe ee EO 133  S411 CANON  cerrada eS we Se a we we 134  9 1 2 Batch filtering and counting   cisnes bow Bowe E Es 135  9 1 3 Setting parameters for batch runs             2 002 ee eee 135  9 1 4 Running the analysis and organizing the results                136  9 1 5   Running de novo assembly and read mapping in batch            136  9 2 How to handle results of analyses       2    0 ee eee et es 136  9 2 1 TADIGOUIDUIG  sirene dd SSR A Re ee 138  9 2 4 MEN  woke eee eee ee ee ERS eRe REE ERASE ee i 138    9 1 Batch processing    Most of the analyses in the Toolbox are able to perform the same analysis on several elements  in one batch  This means that analyzing large amounts of data is very easily accomplished  As  an example  if you use the Find Binding Sites and Create Fragments   2  tool  if you supply five  sequences as shown in figure 9 1  the result table will present an overview of the results for all  five sequences     This is because the input Sequenc
354. imal alignment of the forward and the  reverse primer in a primer pair     e Pair end annealing   the maximum score of consecutive end base pairings found between  the ends of the two primers in the primer pair  in units of hydrogen bonds    e Fragment length   the length  number of nucleotides  of the PCR fragment generated by the  primer pair    16 6 Nested PCR    Nested PCR is a modification of Standard PCR  aimed at reducing product contamination due  to the amplification of unintended primer binding sites  mispriming   If the intended fragment  can not be amplified without interference from competing binding sites  the idea is to seek out  a larger outer fragment which can be unambiguously amplified and which contains the smaller  intended fragment  Having amplified the outer fragment to large numbers  the PCR amplification  of the inner fragment can proceed and will yield amplification of this with minimal contamination     Primer design for nested PCR thus involves designing two primer pairs  one for the outer fragment  and one for the inner fragment     In Nested PCR mode the user must thus define four regions a Forward primer region  the outer  forward primer   a Reverse primer region  the outer reverse primer   a Forward inner primer region   and a Reverse inner primer region  These are defined by making a selection on the sequence  and right clicking the selection  If areas are known where primers must not bind  e g  repeat rich  areas   one or more No primers here
355. imer 1 269 154  401      imer 6  EcoRy primer 5   HindIII 1483 133  1615    Figure 16 22  Right clicking a fragment allows you to annotate the region on the input sequence or  open the fragment as a new sequence     This will put a PCR fragment annotations on the input sequence covering the region specified in  the table  AS you can see from figure 16 22  you can also choose to Open Fragment  This will  create a new sequence representing the PCR product that would be the result of using these two  primers  Note that if you have extensions on the primers  they will be used to construct the new  sequence  If you are doing restriction cloning using primers with restriction site extensions  you  can use this functionality to retrieve the PCR fragment for us in the cloning editor  see section  18 1      16 12 Order primers    To facilitate the ordering of primers and probes  CLC DNA Workbench offers an easy way of  displaying  and saving  a textual representation of one or more primers     select primers in Navigation Area   Toolbox in the Menu Bar   Primers and Probes   E2    Order Primers        This opens a dialog where you can choose additional primers  Clicking OK opens a textual  representation of the primers  see figure 16 23   The first line states the number of primers  being ordered and after this follows the names and nucleotide sequences of the primers in 5    3     orientation  From the editor  the primer information can be copied and pasted to web forms or  e mails  
356. in clu d E   Select reads to include    Paired end status    Include paired end reads From broken pairs  Include single reads    Match specificity  Include specific matches    Include non specific matches    Alignment quality  Include perfectly aligned reads  Include reads with less than perfect alignment    Figure 17 23  Selecting the reads to include        Include paired reads from broken pairs When a pair is broken  either because only one  read in the pair matches  or because the distance or relative orientation is wrong   the reads are placed and colored as single reads  but you can still extract them by  checking this box     Include single reads This will include reads that are marked as single reads  aS opposed  to paired reads   Note that paired reads that have been broken during assembly are  not included in this category  Single reads that come from trimming paired sequence  lists are included in this category     Match specificity Include specific matches Reads that only are mapped to one position     Include non specific matches Reads that have multiple equally good alignments to the  reference  These reads are colored yellow per default     Alignment quality Include perfectly aligned reads Reads where the full read is perfectly aligned  to the reference sequence  or consensus sequence for de novo assemblies   Note  that at the end of the contig  reads may extend beyond the contig  this is not visible  unless you make a selection on the read and observe the posi
357. in this    CHAPTER 14  NUCLEOTIDE ANALYSES 234    printable version of the user manual  Instead  the tables are included in the Help menu in  the Menu Bar  in the appendix      Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   The newly created protein is shown  but is not saved automatically     To save a protein sequence  drag it into the Navigation Area or press Ctrl   S   6   S on Mac  to  activate a save dialog     14 5 1 Translate part of a nucleotide sequence    If you want to make separate translations of all the coding regions of a nucleotide sequence  you  can check the option   Translate CDS and ORF  in the translation dialog  see figure 14 6      If you want to translate a specific coding region  which is annotated on the sequence  use the  following procedure     Open the nucleotide sequence   right click the ORF or CDS annotation   Translate  CDS ORF  F    choose a translation table   OK    If the annotation contains information about the translation  this information will be used  and  you do not have to specify a translation table     The CDS and ORF annotations are colored yellow as default     14 6 Find open reading frames    The CLC DNA Workbench Find Open Reading Frames function can be used to find all open reading  frames  ORF  in a sequence  or  by choosing particular start codons to use  it can be used as  a rudimentary gene finder  ORFs identified will be shown as annotations on the sequence  You  have
358. ing     Primers     Protein analyses      Protein orthologs  JR econdary structure     Sequencing data    dli                   Previous     gt  Next   Finis    XX Cancel            Figure 18 40  Choosing sequence ATP8a 1 MRNA for restriction map analysis     If a sequence was selected before choosing the Toolbox action  this sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     Selecting  sorting and filtering enzymes    Clicking Next lets you define which enzymes to use as basis for finding restriction sites on the  sequence  At the top  you can choose to Use existing enzyme list  Clicking this option lets you  select an enzyme list which is stored in the Navigation Area  See section 18 5 for more about  creating and modifying enzyme lists     Below there are two panels   e To the left  you see all the enzymes that are in the list select above  If you have not chosen  to use an existing enzyme list  this panel shows all the enzymes available        e To the right  there is a list of the enzymes that will be used      The CLC DNA Workbench comes with a standard set of enzymes based on http   www  rebase neb com  You  can customize the enzyme database for your installation  see section E    CHAPTER 18  CLONING AND CUTTING 336    Select enzymes in the left side panel and add them to the right panel by double clicking or clicking  the Add button  E gt    If you e g  wish
359. ing Data Analyses 140 Relative to 1  Eal Primers and Probes      fa Cloning and Restriction Sites HUMDINUC CACACACACACACACACACACACTGC Ce       BLAST Search 160 180 Follow selection   5A Database Search   v                E E H E  E H EE  E                 Processes   Toolbox  _         Idle    1 element s  are selected    Figure 2 2  The HUMDINUC file is imported and opened        2 2 Tutorial  View sequence    This brief tutorial will take you through some different ways to display a sequence in the program   The tutorial introduces zooming on a sequence  dragging tabs  and opening selection in new  view     We will be working with the sequence called pcDNAS atp8a1 located in the    Cloning    folder in the  Example data  Double click the sequence in the Navigation Area to open it  The sequence is  displayed with annotations above it   See figure 2 3      ack pcONAS atp  al       140 160         pcONA3 atpsal TTTAAGCTACAACAAGGCAAGGCTTGACCGACAATTGCATGAA    CMV promoter  Ee em         pcONAs atpsal GAATCTGCTTAGGCGTTAGGCGTTTTGCGCTGCTTCGCGATGTA    promoter    pcONAs atpsal CGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATT       fe O Ea   ODE  Figure 2 3  Sequence pcDNA3 atp8al opened in a view        CHAPTER 2  TUTORIALS 40    As default  CLC DNA Workbench displays a sequence with annotations  colored arrows on the    sequence like the green promoter region annotation in figure 2 3  and zoomed to see the  residues     In this tutorial we want to have an overview of the whole sequence 
360. ing restriction sites to use for cloning  and inserting the fragment into the vector    2 6 1 Locating the data to use    Open the Example data folder in the Navigation Area  Open the Cloning folder  and inside  this folder  open the Primer folder     If you do not have the example data  please go to the Help menu to import it     The data to use in these folders is shown in figure 2 23     E Example Data      Z Ho ATPSal genomic sequence  RR   a    isis Mus ATFEa      E 7 Cloning       Gene of    5 Cloning vector library  EF Enzyme lists  pee Nucleotide motifs       Expression vector           ATRE ATPBal fwd  DOE ATP Bal rev    PO  ala    Primers    Figure 2 23  The data to use in this tutorial     Double click the ATP8a1 mRNA sequence and zoom to Fit Width      _  and you will see the yellow  annotation which is the coding part of the gene  This is the part that we want to insert into the  pcDNA4 TO vector  The primers have already been designed using the primer design tool in CLC  DNA Workbench  to learn more about this  please refer to the Primer design tutorial      2 6 2 Add restriction sites to primers    First  we add restriction sites to the primers  In order to see which restriction enzymes can be  used  we create a split view of the vector and the fragment to insert  In this way we can easily  make a visual check to find enzymes from the multiple cloning site in the vector that do not cut  in the gene of interest  To create the split view     double click the pcDN
361. ing the data to use   duke 6b 0 SH ew oe we E AE RE 52  2 6 2 Add restriction sites to primers    1    a a 52  2 6 3 Simulate PCR to create the fragment       n  nononono oa a a a 54  2 6 4 Specify restriction sites and perform cloning      aoao aoa a a 55  2 7 Tutorial  Primer design      1    0 ee eee et ee 57  2   1 Specifying a region for the forward primer            2  2 858205 5   2   2 Examining the primer suggestions              0 2 502  50006  58  Edo Calculating a primer pair   and oe ee be ee ee   A EEE 60    CHAPTER 2  TUTORIALS 37    2 8 Tutorial  BLAST search         2 02 ee eet ee 61  2 8 1 Performing the BLAST search         2    a eee ee ee eee 61  2 8 2 Inspecting the results       n soosoo a a a a eed b   w Gea ea  amp  ws 63  2 8 3 Using the BLAST table view             00 80 eee ee ee ne 63   2 9 Tutorial  Tips for specialized BLAST searches             082 2000 64  2 9 1 Locate a protein sequence on the chromosome                64  2 9 2 BLAST for primer binding sites               0  a 67  2 9 3 Finding remote protein homologues               0 55806  67  2 9 4 Further reading        2 2 6 2 ee ee rara 68   2 10 Tutorial  Align protein sequences         0 088 ee eee een nee 69  2 10 1 The alignment dialog 2 4   eee we Pee eee RE ee ee E E E A 69   2 11 Tutorial  Create and modify a phylogenetic tree          0258 0888  71  2 11 1 BECO   bw a REDE EOE Eee eee ee he 11   2 12 Tutorial  Find restriction sites      1    eee a 72  2 12 1 The Side Panel wa
362. ing the statistically founded maximum  likelihood  ML  approach  Felsenstein  1981   Both approaches generate a phylogenetic tree   The tools are found in     Toolbox   Alignments and trees  3    To generate a distance based phylogenetic tree choose   Create Tree       and to generate a maximum likelihood based phylogenetic tree choose   Maximum Likelihood Phylogeny   f     In both cases the dialog displayed in figure 20 1 will be opened     366    CHAPTER 20  PHYLOGENETIC TREES 367                   E  q Create Tree    1  Select alignments of   Select alia ments Of sa E  ee Projects  Selected Elements   1   a omni IEE alignment 1   k Example Data      Cloning      Primers     Protein analyses      5 Protein orthologs      RNA secondary str      Sequencing data  b          Q    lt enter search term gt  A    E   Previous  gt  Next Finish X Cancel                   Figure 20 1  Creating a Tree     If an alignment was selected before choosing the Toolbox action  this alignment is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove elements from  the Navigation Area  Click Next to adjust parameters     20 1 1 Phylogenetic tree parameters    Distance based methods    E  BB create Tree EJ    1  Select alignments of    See parameters  same type          2  Set parameters    Algorithm  Neighbor Joining w  Bootstrapping     V  Perform bootstrap analysis    Replicates  100       JL9   Previous    et    Jensh    Xena       Figure 20 2  Adjusting pa
363. ings    Apply Saved Settings P    Figure 2 8  Saving the settings of the Side Panel           This will open the dialog shown in figure 2 9     CHAPTER 2  TUTORIALS 43    a Save Settings  ES     Please enter a name for these user settings  my settings         Always apply these settings    ECA    Figure 2 9  Dialog for saving the settings of the Side Panel        In this way you can save the current state of the settings in the Side Panel so that you can apply  them to alignments later on  If you check Always apply these settings  these settings will be  applied every time you open a view of the alignment     Type  My settings    in the dialog and click Save     2 3 2 Applying saved settings    When you click the Save Restore Settings button  i   again and select Apply Saved Settings   you will see  My settings    in the menu together with some pre defined settings that the CLC DNA  Workbench has created for you  see figure 2 10      ad  a Save Settings     Delete Settings     Apply Saved Settings b Black   white       Conservation color  Mon compack  Show annotations  my settings    CLIC Standard Settings    Figure 2 10  Menu for applying saved settings     Whenever you open an alignment  you will be able to apply these settings  Each kind of view has  its own list of settings that can be applied     At the bottom of the list you will see the  CLC Standard Settings  which are the default settings  for the view     2 4 Tutorial  GenBank search and download    The CLC DNA Work
364. ino acids are the basic components of proteins  The amino acid distribution in a protein  is simply the percentage of the different amino acids represented in a particular protein of  interest  Amino acid composition is generally conserved through family classes in different  organisms which can be useful when studying a particular protein or enzymes across species  borders  Another interesting observation is that amino acid composition variate slightly between    CHAPTER 13  GENERAL SEQUENCE ANALYSES 218    proteins from different subcellular localizations  This fact has been used in several computational  methods  used for prediction of subcellular localization     Annotation table    This table provides an overview of all the different annotations associated with the sequence and  their incidence     Dipeptide distribution    This measure is simply a count  or frequency  of all the observed adjacent pairs of amino acids   dipeptides  found in the protein  It is only possible to report neighboring amino acids  Knowledge  on dipeptide composition have previously been used for prediction of subcellular localization     Creative Commons License    All CLC bio   s scientific articles are licensed under a Creative Commons Attribution NonCommercial   NoDerivs 2 5 License  You are free to copy  distribute  display  and use the work for educational  purposes  under the following conditions  You must attribute the work in its original form and   CLC bio  has to be clearly labeled as
365. ion  by studying its annotation or by  aligning it to the query sequence     2 8 3 Using the BLAST table view  at the bottom        As an alternative to the graphic BLAST view  you can click the Table View  H  This will display a tabular view of the BLASt hits as shown in figure 2 45     CHAPTER 2  TUTORIALS 64             EE ATP amp al BLAST   amp        Rows  54 Summary of hits From query  ATPBal Filter  All x      Description    Probable phospholipid transporting ATPase IB  ATPase class I type 8A me    0 00 4 058 00  Probable phospholipid transporting ATPase ID  ATPase class I type 8B me    0 00 2 120 00  Probable phospholipid transporting ATPase IM  ATPase class I type 8B me    0 00 2 109 00  Probable phospholipid transporting ATPase IC  Familial intrahepatic cholest    0 00 2 078 00  Probable phospholipid transporting ATPase IF  ATPase class I type 11B   A    0 00 1 732 00  Probable phospholipid transporting ATPase IH  ATPase class I type 11A      0 00 1 711 00  Probable phospholipid transporting ATPase IG  ATPase class I type 11C       0 00 1 670 00  Probable ia tl ali sake ATPase IE  ATPase class I type 8B me   2 93E 151 1 372 00   4 99E 50 499 00         Download and Open     Download and Open     Download and Save     Open at NCBI   Open Structure                   Figure 2 45  Output of a BLAST search shown in a table     This view provides more statistics about the hits  and you can use the filter to search for e g   a specific type of protein etc  If you wish to d
366. ion clone containing all the fragments will be created  You can  find an explanation of the multi site gateway system at http   tools invitrogen com   downloads gateway multisite seminar html    CHAPTER 18  CLONING AND CUTTING 327    Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     The output is a number of expression clones depending on how many entry clones and destination  vectors that you selected  The attL and attR sites have been used for the recombination  and the  expression clone is now equipped with attB sites as shown in figure 18 26     hine Dalgarno  tp8a1        ROP    atp8a1_CDS  pDEST14 Expression Clone     8086 bp  pBR322 __   m SS   bla promot    Figure 18 26  The resulting expression clone opened in a circular view     You can choose to create a sequence list with the bi products as well     18 3 Restriction site analysis    There are two ways of finding and showing restriction sites     e In many cases  the dynamic restriction sites found in the Side Panel of Sequence views will  be useful  since it is a quick and easy way of showing restriction sites     e In the Toolbox you will find the other way of doing restriction site analyses  This way  provides more control of the analysis and gives you more output options  e g  a table  of restriction sites and you can perform the same restriction map analysis on several  sequences in one step     This chapter first describes the dynamic restriction sites  fol
367. ion of DNA or RNA to protein              0 000004 eee    Find open reading frames        15 Protein analyses    toal  Taa    Protein charge              Hydrophobicity              15 3 Reverse translation from protein into DNA   aoaaa a a ee ee ee    16 Primers    16 1  16 2  16 3  16 4  16 5  16 6  16 7  16 8  16 9    Primer design   an introduction    Setting parameters for primers and probes    nononono oaoa oa e a e a    Graphical display of primer information     a aoa aoao a e 0 0000 ee eee    Output from primer design       Standard PCR            Nested PCR             AM a ce o anaana       Sequencing primers           Alignment based primer and probe design          2 0 00  ee ee eee    16 10 Analyze primer properties        16 11 Find binding sites and create fragments             a a ee ees    16 12 Order primers              17 Sequencing data analyses and Assembly    ay a    Importing and viewing trace data    212  218  219  221    229  229  230  231  232  232  234    237  231  239  243    248  249  251  254  255  256  260  262  264  265  269  2 1  215    277    CONTENTS f    do MNA Cok eee eee eee ee EEE TS ee eS 279  17 3 Trim sequences      1    rara 288  1  4 Assemble sequences        0 0 a a aa 291  17 5 Assemble to reference sequence    1    ee a 293  17 6 Add sequences to an existing contig         oaoa oa 0  eee ee 295  List WIEW ANG COM Contigs  lt  aa ss a eee a ee hee Be ee ee DAS ew 296  1f 0 Reassemble CONU   sane eae bea bbe waa eo be wee Pk eat 
368. ionalities are missing  and you will have to restart the CLC DNA  Workbench again  without pressing Shift      1 5 3 CLC Sequence Viewer vs  Workbenches    The advanced analyses of the commercial workbenches  CLC Protein Workbench  CLC RNA  Workbench and CLC DNA Workbench are not present in CLC Sequence Viewer  Likewise  some  advanced analyses are available in CLC DNA Workbench but not in CLC RNA Workbench or CLC  Protein Workbench  and vice versa  All types of basic and advanced analyses are available in CLC  Main Workbench     However  the output of the commercial workbenches can be viewed in all other workbenches   This allows you to share the result of your advanced analyses from e g  CLC Main Workbench   with people working with e g  CLC Sequence Viewer  They will be able to view the results of your  analyses  but not redo the analyses     The CLC Workbenches and the CLC Sequence Viewer are developed for Windows  Mac and Linux  platforms  Data can be exported imported between the different platforms in the same easy way  as when exporting importing between two computers with e g  Windows     1 6 When the program is installed  Getting started    CLC DNA Workbench includes an extensive Help function  which can be found in the Help menu  of the program   s Menu bar  The Help can also be shown by pressing F1  The help topics are  sorted in a table of contents and the topics can be searched     We also recommend our Online presentations where a product specialist from CLC bi
369. ions 5    of the template specific part of the primer              p    Insets    abe Forward insets    Shine Dalgarno  4664667    Kozak   ATG  ACCATGG    Kozak  upstream of ATG   ACC    Start codon  47S    Pi His tag PCATCACCATCACCATCAC    Kozak   Peroxisomal targetting sequence  4CCATGSGCTGOCGTSETGCTCOCECGSCGSC   Lumio tag  TSTTSTCCTGSCTSTTGC    TEY cleavage site  G4444CCTSTATTITOAGGGA    EK cleavage site  GACGATGACGATAAA     Sequence oF Interest          o ee     AACCCAGCTTTCTTGTACASAGTGGT CCCC  atthe    Figure 18 17  Pressing Shift   F1 shows some of the common additions  This default list can be  modified  see section 18 2 1     You can also manually type a sequence with the keyboard or paste in a sequence from the  clipboard by pressing Ctrl   v      v on Mac      Clicking Next allows you to specify the length of the template specific part of the primers as  shown in figure 18 19     The CLC DNA Workbench is not doing any kind of primer design when adding the attB sites  As a  user  you simply specify the length of the template specific part of the primer  and together with  the attB sites and optional primer additions  this will be the primer  The primer region will be  annotated in the resulting attB flanked sequence and you can also get a list of primers as you  can see when clicking Next  see figure 18 20     CHAPTER 18  CLONING AND CUTTING 322    Add attB Sites    1  Select nucleotide  sequences Insets    2  Specify auxiliary insets Forward insets  AGGAGGT   Press 
370. is inserted  it will be marked with a selection      HBG2  HBG2   HBG2  Conflict  tet Conflict Conflict bla  Insert ROP protein bla        pBR322    prm  ee ee TcTccTTGeATG AGG GTCGCATG ACCATTC  sequence detalls reregan ace E AE   Figure 18 13  One sequence is now inserted into the cloning vector  The sequence inserted is  automatically selected        18 1 4 Insert restriction site    If you make a selection on the sequence  right click  you find this option for inserting the  recognition sequence of a restriction enzyme before or after the region you selected  This will  display a dialog as shown in figure 18 14    At the top  you can select an existing enzyme list or you can use the full list of enzymes  default    Select an enzyme  and you will see its recognition sequence in the text field below the list   AAGCTT   If you wish to insert additional residues such as tags etc   this can be typed into the  text fields adjacent to the recognition sequence       Click OK will insert the sequence before or after the selection  If the enzyme selected was not  already present in the list in the Side Panel  it will now be added and selected  Furthermore  an  restriction site annotation is added     18 2 Gateway cloning    CLC DNA Workbench offers tools to perform in silico Gateway cloning   including Multi site Gateway  cloning     The three tools for doing Gateway cloning in the CLC DNA Workbench mimic the procedure  followed in the lab      Gateway is a registered trademark of I
371. is option is useful when  comparing sequence reads to a closely related reference sequence e g  when sequencing  for SNP characterization         Only include part of the reference sequence in the contig  If the aligned sequence  reads only cover a small part of the reference sequence  it may not be desirable to  include the whole reference sequence in the contig data object  When selected  this  option lets you specify how many residues from the reference sequence that should be  kept on each side of the region spanned by sequencing reads by entering the number  in the Extra residues field     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 294    e Do not include reference sequence in contig s   This will produce a contig data object  without the reference sequence  The contig is created in the same way as when you make  an ordinary assembly  see section 17 4   but the reference sequence is omitted in the  resulting contig  In the assembly process the reference sequence is only used as a scaffold  for alignment  This option is useful when performing assembly with a reference sequence  that is not closely related to the sequencing reads         Conflicts resolved with  If there is a conflict  i e  a position where there is  disagreement about the residue  A  C  T or G   you can specify how the contig  sequence should reflect this conflict     x Unknown nucleotide  N   The contig will be assigned an    N    character in all  positions with conflicts      Ambiguity nucleotid
372. is term limits the query to  proteins of human origin        E   E nce elast  88   1  Select sequences of same EEn  aramee   type       2  Set program parameters    3  Set input parameters    Choose parameters    Limit by entrez query  aeuo        4  Low complexity      Human repeats  Choose filter        Mask For lookup    Mask lower case  Expect  10  Word size  3 v  Matrix    BLOSUM62 v    Gap cost  Existence  11 Extension  1 w             COCO ej Coe LE  Res     Figure 2 43  The BLAST search is limited to homo sapiens ORGN   The remaining parameters are  left as default                 Choose to Open your results   Click Finish to accept the parameter settings and begin the BLAST search     The computer now contacts NCBI and places your query in the BLAST search queue  After a short  while the result should be received and opened in a new view     CHAPTER 2  TUTORIALS 63    2 8 2 Inspecting the results    The output is shown in figure 2 44 and consists of a list of potential homologs that are sorted by  their BLAST match score and shown in descending order below the query sequence          ATPa  mn c MM nan Imal                 ATP8al BLAST                      2QU ATBA1 HUMAN                     a a nn  NTI2 ATBA2_H  8198 ATBB2_H     sp Q9NTIZIATEA  HUMAN Probable phospholipicttransporting ATPase IB  ATPase class   2  ML 1              score   1567 8 bits  4058   Expect   OE00  TF62 AT864_H7 identities   779  144  68   Positives   933 1144  82   Gaps   29 1144  2    3520 
373. ision  Abbreviation of GenBank divisions  See section 3 3 in the GenBank release  notes for a full list of GenBank divisions     e Length  The length of the sequence     e Modification date  Modification date from the database  This means that this date does  not reflect your own changes to the sequence  See the history  section 8  for information  about the latest changes to the sequence after it was downloaded from the database     e Organism  Scientific name of the organism  first line  and taxonomic classification levels   second and subsequent lines      The information available depends on the origin of the sequence  Sequences downloaded from  database like NCBI and UniProt  see section 12  have this information  On the other hand  some  sequence formats like fasta format do not contain this information     Some of the information can be edited by clicking the blue Edit text  This means that you can  add your own information to sequences that do not derive from databases     Note that for other kinds of data  the Element info will only have Name and Description     10 5 View as text       sequence can be viewed as text without any layout and text formatting  This displays all the  information about the sequence in the GenBank file format  To view a sequence as text     CHAPTER 10  VIEWING AND EDITING SEQUENCES 162    select a sequence in the Navigation Area   Show in the Toolbar   As text    This way it is possible to see background information about e g  the authors an
374. ist  alignment or contig   you have two additional  options     right click an annotation   Delete   Delete All Annotations from All Sequences    right click an annotation   Delete   Delete Annotations of Type  type  from All  Sequences    10 4 Element information    The normal view of a sequence  by double clicking  shows the annotations as boxes along the  sequence  but often there is more information available about sequences  This information is  available through the Element info view     To view the sequence information     select a sequence in the Navigation Area   Show       in the Toolbar   Element info   15     This will display a view similar to fig 10 13     All the lines in the view are headings  and the corresponding text can be shown by clicking the  text     CHAPTER 10  VIEWING AND EDITING SEQUENCES 161      Name Edit     Description Edit     Comments Edit     KeyWords Edit     Db Source    gt  Gb Division    gt  Length     Modification Date   gt  Latin name Edit    gt  Common name Edit     gt  Taxonomy name Edit   Figure 10 13  The initial display of sequence info for the HUMHBB DNA sequence from the Example  data    e Name  The name of the sequence which is also shown in sequence views and in the   Navigation Area    e Description  A description of the sequence    e Comments  The author   s comments about the sequence    e Keywords  Keywords describing the sequence    e Db source  Accession numbers in other databases concerning the same sequence     e Gb Div
375. ites  si    Restriction Site Analysis  of     Click Next to set parameters for the restriction map analysis     In this step first select Use existing enzyme list and click the Browse for enzyme list button   acy   Select the    Popular enzymes    in the Cloning folder under Enzyme lists     Then write 3    into the filter below to the left  Select all the enzymes and click the Add button       The result should be like in figure 2 57     Restriction Site Analysis    1  Select DNA RNA      Eneymes t O DE considered im calculatiom  sequence s Enzyme list       2 Enzymes to be considered Use existing enzyme list  Popular enzymes v 19   in calculation ae    Enzymes in  Popular en     Enzymes to be used  Filter  a Filter           Name Overhang Methylat    Popul      Name Overhang Methyla    Pop       PstI 3    tgca 5  N6 met     KpnI 3    gtac 5   N   met          Sacl 3    agct 5   S meth          SphI 3   catg  Apal 3    ggcc S   S meth         Ball 3   nnn 5  N4 met         Chal 3    gate etek EI  FokI 5   lt NA gt   3   N   met         Hhal 3  cg 5   S meth         Nsil 3   tgca  Sacll 3   gc 5  S meth                 Figure 2 57  Selecting enzymes     Click Next  In this step you specify that you want to show enzymes that cut the sequence only  once  This means that you should de select the Two restriction sites checkbox     Click Next and select that you want to Add restriction sites as annotations on sequence and  Create restriction map   See figure 2 58         EB Restri
376. ity scale which  shares many features with the other hydrophobicity scales  Eisenberg et al   1984      e Rose  The hydrophobicity scale by Rose et al  is correlated to the average area of buried  amino acids in globular proteins  Rose et al   1985   This results in a scale which is not  showing the helices of a protein  but rather the surface accessibility     e Janin  This scale also provides information about the accessible and buried amino acid  residues of globular proteins  Janin  1979      e Hopp Woods  Hopp and Woods developed their hydrophobicity scale for identification of  potentially antigenic sites in proteins  This scale is basically a hydrophilic index where  apolar residues have been assigned negative values  Antigenic sites are likely to be  predicted when using a window size of 7  Hopp and Woods  1983      e Welling   Welling et al   1985  Welling et al  used information on the relative occurrence of  amino acids in antigenic regions to make a scale which is useful for prediction of antigenic  regions  This method is better than the Hopp Woods scale of hydrophobicity which is also  used to identify antigenic regions     CHAPTER 10  VIEWING AND EDITING SEQUENCES 14 7    e Kolaskar Tongaonkar  A semi empirical method for prediction of antigenic regions has been  developed  Kolaskar and Tongaonkar  1990   This method also includes information of  Surface accessibility and flexibility and at the time of publication the method was able to  predict antigenic deter
377. j      Primer designer  both for single sequences and alignments   TE                Elis        Contig mapping view         e In the table of annotations  E      e In the text view of sequences          In the following sections  these view options will be described in more detail     In all the views except the text view        annotations can be added  modified and deleted  This  is described in the following sections     View Annotations in sequence views  Figure 10 6 shows an annotation displayed on a sequence     CDS    20       HUMHBB GGCCCTGTTCTGATCATGGGCCCTTCCTAACACTGCATGACTACCTTA    CDS    HUMHBB TTCTTGTTAGGATCCAAGCAACGGATTCTGCTGGAGCTGTCGTTTTTT    CDS   we 140       HUMHBB CTGGGTGTGTCTCCAACAAGTCCTGAGCACACATAACTGGAAACAATG  Figure 10 6  An annotation showing a coding region on a genomic dna sequence     The various sequence views listed in section 10 3 1 have different default settings for showing  annotations  However  they all have two groups in the Side Panel in common   e Annotation Layout    e Annotation Types    The two groups are shown in figure 10 7     In the Annotation layout group  you can specify how the annotations should be displayed  notice  that there are some minor differences between the different sequence views      e Show annotations  Determines whether the annotations are shown   e Position       On sequence  The annotations are placed on the sequence  The residues are visible  through the annotations  if you have zoomed in to 100         Next to 
378. kbench             Figure 1 14  An old license is detected     When you click Next  the Workbench checks on CLC bio   s web server to see if you are entitled to  upgrade your license     Note  If you should be entitled to get an upgrade  and you do not get one automatically in this  process  please contact support clcbio com     In this dialog  there are two options     e Direct download  The workbench will attempt to contact the online CLC Licenses Service   and download the license directly  This method requires internet access from the workbench     e Go to license download web page  The workbench will open a Web Browser with the  License Download web page when you click Next  From there you will be able to download  your license as a file and import it  This option allows you to get a license  even though the  Workbench does not have direct access to the CLC Licenses Service     If you select the first option  and it turns out that you do not have internet access from the    CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 23    Workbench  because of a firewall  proxy server etc    you will be able to click Previous and use  the other option instead     Direct download    Selecting the first option takes you to the dialog shown in figure 1 15          License Wizard zs    d CLC DNA Workbench    Requesting a license with id  CLC LICENSE SRENMNSTED 0D43CA9       Requesting and downloading a license by establishing a direct connection to the CLC bio License Web Service     Your 
379. king Next will show the dialog in figure 18 38                          fa  BB Show Enzymes Cutting Inside Outside Selection  es   1  Enzymes to be considered    NumberoPcurstes O ZJ     I  Ff    in calculation  2  Number of cut sites Selected region  883  975  Cut sites  Inside selection Outside selection   F  No cut sites  0   V  No cut sites  0    7  One cut site  1  AND    One cut site  1    E  Two cut sites  2     Two cut sites  2              Preview  1 enzymes will be added to Side Panel        Enzyme name   of cuts within selection   of cuts elsewhere    NotI  a lo                        Figure 18 38  Deciding number of cut sites inside and outside the selection        At the top of the dialog  you see the selected region  and below are two panels     CHAPTER 18  CLONING AND CUTTING 334    e Inside selection  Specify how many times you wish the enzyme to cut inside the selection   In the example described above   One cut site  1   should be selected to only show  enzymes cutting once in the selection     e Outside selection  Specify how many times you wish the enzyme to cut outside the    selection  i e  the rest of the sequence   In the example above   No cut sites  O   should  be selected     These panels offer a lot of flexibility for combining number of cut sites inside and outside  the selection  respectively  To give a hint of how many enzymes will be added based on the  combination of cut sites  the preview panel at the bottom lists the enzymes which will be ad
380. l  This feature is useful when e g  designing an experiment which will allow the differentiation    CHAPTER 18  CLONING AND CUTTING 341    of a successful and an unsuccessful cloning experiment on the basis of a restriction map   There are two main ways to simulate gel separation of nucleotide sequences   e One or more sequences can be digested with restriction enzymes and the resulting  fragments can be separated on a gel     e A number of existing sequences can be separated on a gel     There are several ways to apply these functionalities as described below     18 4 1 Separate fragments of sequences on gel    This section explains how to simulate a gel electrophoresis of one or more sequences which are  digested with restriction enzymes  There are two ways to do this     e When performing the Restriction Site Analysis from the Toolbox  you can choose to create  a restriction map which can be shown as a gel  This is explained in section 18 3 2     e From all the graphical views of sequences  you can right click the name of the sequence  and choose  Digest Sequence with Selected Enzymes and Run on Gel  El   The views  where this option is available are listed below        Circular view  see section 10 2        Ordinary sequence view  see section 10 1        Graphical view of sequence lists  see section 10 7      Cloning editor  see section 18 1      Primer designer  see section 16 3      Furthermore  you can also right click an empty part of the view of the graphical view of
381. l alignment is taken into account and not the full length query  sequence     Identity  Shows the number of identical residues in the query and hit sequence     Yldentity  Shows the percentage of identical residues in the query and hit sequence     CHAPTER 12  BLAST SEARCH 185    e Positive  Shows the number of similar but not necessarily identical residues in the query  and hit sequence     e  Positive  Shows the percentage of similar but not necessarily identical residues in the  query and hit sequence     e Gaps  Shows the number of gaps in the query and hit sequence   e  Gaps  Shows the percentage of gaps in the query and hit sequence   e Query Frame Strand  Shows the frame or strand of the query sequence     e Hit Frame Strand  Shows the frame or strand of the hit sequence     In the BLAST table view you can handle the hit sequences  Select one or more sequences from  the table  and apply one of the following functions     e Download and Open  Download the full sequence from NCBI and opens it  If multiple  sequences are selected  they will all open  if the same sequence is listed several times   only one copy of the sequence is downloaded and opened      e Download and Save  Download the full sequence from NCBI and save it  When you click  the button  there will be a save dialog letting you specify a folder to save the sequences  If  multiple sequences are selected  they will all open  if the same sequence is listed several  times  only one copy of the sequence is downlo
382. l will not be saved     CHAPTER 2  TUTORIALS 42      Annotation layout    Show annotations    Position Next to sequence       Offset More offset       Label Stacked we  Show arrows  Use gradients    Annotation types  DM Do  Active site  EM  4  Gene      OD  Metal binding site      DO  modified site     BO Dl pr binding         Protein    DD Region      9 CF Source       Select All  Deselect All    Figure 2 6  The Annotation Layout and the Annotation Types in the Side Panel           HEE ATPase protei                  ill    540 560  I   ha    r          gt       Alignment info  Transmembrane region Topological domain Cad       Show                Limit Majority      F  No gaps  Ambiguous symbol   x z   w Conservation  Foreground color       094296     B    KGLOBFwif    vysnLvENSEr BTFELVRYIG  aoLiissdLDi                       Background color       Transmembrane region Topological domain       m    0  100  __  Graph   Height low    Bar plot Y         gt  Gap Fraction    b Color different residues v        lt                       m    P57792  APMAA IYHFERALME NSYFIBBSBY BsiBiVKvlG sir nog H         EE DY Of  Figure 2 1  The alignment when all the above settings have been changed        This means that you would have to perform the changes again next time you open the alignment   To save the changes to the Side Panel  click the Save Restore Settings button      at the top    of the Side Panel and click Save Settings  see figure 2 8              ave Settings         Delete Sett
383. layer  at a time  the content of subfolders is not visible in this  view  Also note that only sequences have the full span of information like organism etc     Batch edit folder elements    You can select a number of elements in the table  right click and choose Edit to batch edit the  elements  In this way  you can change the e g  the description or common name of several  elements in one go     In figure 3 7 you can see an example where the common name of five sequence are renamed in  one go  In this example  a dialog with a text field will be shown  letting you enter a new common  name for these five sequences  Note  This information is directly saved and you cannot  undo     3 2 View Area    The View Area is the right hand part of the screen  displaying your current work  The View Area  may consist of one or more Views  represented by tabs at the top of the View Area     This is illustrated in figure 3 8     The tab concept is central to working with CLC DNA Workbench  because several operations can  be performed by dragging the tab of a view  and extended right click menus can be activated from  the tabs     CHAPTER 3  USER INTERFACE       Type Mame Modified     Modifi    Descri     ae Mismps    Tue Jun    smoensted Mi3mp       ae Mismp9    Tue Jun    smoensted MiSmp       oe  Aa  oe  soe Delete  ae  ame PATH LO Tue Jun    Edit b  ate pATHI I Tue JUN    Sioe Sorin  ae p  THE Tue Jun    smoensted Clonin  ae p  THS Tue Jun    smoensted Clonirn  HC pELCATS Tue Jun    smoens
384. lected     12 4 1 Migrating from a previous version of the Workbench    In versions released before 2011  the BLAST database management was very different from this   In order to migrate from the older versions  please add the folders of the old BLAST databases  as locations in the BLAST database manager  see section 12 4   The old representations of the  BLAST databases in the Navigation Area can be deleted     If you have saved the BLAST databases in the default folder  they will automatically appear  because the default database location used in CLC DNA Workbench 6 6 is the same as the  default folder specified for saving BLAST databases in the old version     12 5 Bioinformatics explained  BLAST    BLAST  Basic Local Alignment Search Tool  has become the defacto standard in search and  alignment tools  Altschul et al   1990   The BLAST algorithm is still actively being developed  and is one of the most cited papers ever written in this field of biology  Many researchers  use BLAST as an initial screening of their sequence data from the laboratory and to get an  idea of what they are working on  BLAST is far from being basic as the name indicates  it  is a highly advanced algorithm which has become very popular due to availability  speed  and  accuracy  In short  a BLAST search identifies homologous sequences by searching one or  more databases usually hosted by NCBI  http   www ncbi nlm nih gov    on the query  sequence of interest  McGinnis and Madden  2004      BLAST is
385. lection to All Reads     The opposite is also possible  make a selection on one of the reads  right click  and Transfer  Selection to Contig Sequence     17 7 5 Output from the contig    Due to the integrated nature of CLC DNA Workbench it is easy to use the consensus sequences  as input for additional analyses  There are three options when you are viewing a mapping     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 301    right click the name of the consensus sequence  to the left    Open Copy of  Sequence   Save  HD the new sequence    right click the name of the consensus sequence  to the left    Open Copy of  Sequence Including Gaps   Save  HD the new sequence    right click the name of the consensus sequence  to the left    Open This Sequence    Open Copy of Sequence creates a copy of the sequence  omitting all gap regions  which can be  saved and used independently     Open Copy of Sequence Including Gaps replaces all gaps with Ns  Any regions that appear to be  deletions will be removed if this option is chosen  For example     reference CCCGGAAAGGTTT  consensus CCC  AAA  TTT  matchl CCC    AAA   match2 TTT    Here  if you chose to open a copy of the consensus with gaps  you would get this output  CCCAAANNTTT    Open This Sequence will not create a new sequence but simply let you see the sequence in a  sequence view  ThiS means that the sequence still  belong  to the contig and will be saved  together with the contig  It also means that if you add annotations to the
386. ligonucleotide  melting temperatures under PCR conditions  nearest neighbor corrections for Mg 2    deoxynu   cleotide triphosphate  and dimethyl sulfoxide concentrations with comparison to alternative  empirical formulas  Clin Chem  47 11  1956 1961      Welling et al   1985  Welling  G  W   Weijer  W  J   van der Zee  R   and Welling Wester  S    1985   Prediction of sequential antigenic regions in proteins  FEBS Lett  188 2  215 218      Wootton and Federhen  1993  Wootton  J  C  and Federhen  S   1993   Statistics of local  complexity in amino acid sequences and sequence databases  Computers in Chemistry   17 149 163      Yang  1994a  Yang  Z   1994a   Estimating the pattern of nucleotide substitution  Journal of  Molecular Evolution  39 1  105 111      Yang  1994b  Yang  Z   1994b   Maximum likelihood phylogenetic estimation from DNA se   quences with variable rates over sites  Approximate methods  Journal of Molecular Evolution   39 3  306 314      Yang and Rannala  1997  Yang  Z  and Rannala  B   1997   Bayesian phylogenetic inference  using DNA sequences  a Markov Chain Monte Carlo Method  Mol Biol Evol  14 7  717  24     Part V    Index    404    Index    contig  extract from selection  301  454 sequencing data  3 6    AB1  file format  393  Abbreviations  amino acids  396  ABI  file format  393  About CLC Workbenches  27  Accession number  display  82   ace  file format  395  ACE  file format  394  Add  annotations  156  3    sequences to alignment  359  sequences t
387. lity and at the time of publication the method was able to predict antigenic  determinants with an accuracy of 75      Surface Probability  Display of surface probability based on the algorithm by  Emini et al   1985    This algorithm has been used to identify antigenic determinants on the surface of proteins     Chain Flexibility  isplay of backbone chain flexibility based on the algorithm by  Karplus and  Schulz  1985   It is known that chain flexibility is an indication of a putative antigenic determinant     Many more scales have been published throughout the last three decades  Even though more  advanced methods have been developed for prediction of membrane spanning regions  the  simple and very fast calculations are still highly used     Other useful resources    AAindex  Amino acid index database  http   www genome ad  jp dbget aaindex  html    Creative Commons License    All CLC bio   s scientific articles are licensed under a Creative Commons Attribution NonCommercial   NoDerivs 2 5 License  You are free to copy  distribute  display  and use the work for educational  purposes  under the following conditions  You must attribute the work in its original form and   CLC bio  has to be clearly labeled as author and provider of the work  You may not use this    CHAPTER 15  PROTEIN ANALYSES 243    aa aa Kyte  Hopp  Cornette Eisenberg Rose Janin Engelman  Doolittle Woods  GES   A Alanine 1 80  0 50 0 20 0 62 0 74 0 30 1 60  C Cysteine 2 50  1 00 4 10 0 29 0 91 0 90 2 00  D 
388. lowed by  the toolbox way   This  section also includes an explanation of how to simulate a gel with the selected enzymes  The  final section in this chapter focuses on enzyme lists which represent an easy way of managing  restriction enzymes     18 3 1 Dynamic restriction sites    If you open a sequence  a sequence list etc  you will find the Restriction Sites group in the Side  Panel     As shown in figure 18 27 you can display restriction sites as colored triangles and lines on the  sequence  The Restriction sites group in the side panel shows a list of enzymes  represented  by different colors corresponding to the colors of the triangles on the sequence  By selecting or  deselecting the enzymes in the list  you can specify which enzymes    restriction sites should be  displayed     CHAPTER 18  CLONING AND CUTTING 328      Restriction sites  Labels  Stacked      Sorting    A LI  Po  54  Non cutters       4  Single cutters     9  Bami O  P  7  EcoRI O  E T Eo ao O  E 7  Hinan  1  O  veio     pita     vis O         Double cutters  E cot  2      ED smal  2  O          54  Multiple cutters  no no E F sat 3      Figure 18 27  Showing restriction sites of ten restriction enzymes     ST TAGAGGGCCCGTTTAAACC    The color of the restriction enzyme can be changed by clicking the colored box next to the  enzyme   s name  The name of the enzyme can also be shown next to the restriction site by  selecting Show name flags above the list of restriction enzymes     There is also an option 
389. ls  Molecular Biology Resources  Promega Corporation   EURx Ltd              345    Figure 18 52  Showing additional information about an enzyme like recognition sequence or a list    of commercial vendors     18 5 2 View and modify enzyme list    An enzyme list is shown in figure 18 53     The list can be sorted by clicking the columns                       h   ie Ta    z       v Column width       l Automatic v            Show column    E5 all enzymes O    Rows  1362 Table of restriction enzymes Filter       Name Recognition sequence   Overhang Suppliers Methylation sensitivity Star activity  EcoR   gatatc Blunt GE Healthc    N   methyladenosine Yes       BglII agatct 5    gatc GE Healthc    N4 methylcytosine No    Sall gtcgac 5   tega GE Healthc    N6 methyladenosine Yes  xhol ctcgag 5   tcga GE Healthc    N   methyladenosine No  HindIII aagctt 5    agct GE Healthc    N   methyladenosine Yes  Xbal tctaga 5   ctag GE Healthc    N   methyladenosine Yes  EcoRI gaattc 5    aatt GE Healthc    N   methyladenosine Yes  PstI ctgcag 3   tgca GE Healthc    N6 methyladenosine Yes  BamHI ggatcc 5    gatc GE Healthc    N   methylcytosine Yes  Clal atcgat 5   cg GE Healthc    N   methyladenosine No     NotI gcggecge 5    ggce GE Healthc    N   methylcytosine No  NdeI catatg 5  ta GE Healthc    N   methyladenosine Yes     SacI gagctc 3    agct GE Healthc    5 methylcytosine Yes  Pyull cagctg Blunt GE Healthc    N4   methylcytosine Yes v   Ha    Name   Recognition sequence  Overhang   Sup
390. ls not running in batch mode  see above   All the  analyses in the Toolbox are performed in a step by step procedure  First  you select elements    CHAPTER 9  BATCHING AND RESULT HANDLING 137             a  E Map Reads to Reference NR Es      Choose where to run    Short reads mapping parameters Long reads mapping parameters      Select sequencing reads    Batch overview    Set references Mismatch cost 24 Mismatch cost      Set mapping parameters Limit 85 Insertion cost   7  Fast ungapped alignment Deletion cost          Insertion cost 3k Length fraction 0 5  Deletion cost 36 Similarity 0 8     Global alignment    Global alignment           Color space alignment    Color space alignment    Colorspace error cost 35 Colorspace error cost  Paired parameters    Minimum distance 180      Maximum distance 250    2  Enis    Figure 9 4  Read mapping parameters in batch              for analyses  and then there are a number of steps where you can specify parameters  some of  the analyses have no parameters  e g  when translating DNA to RNA   The final step concerns    the handling of the results of the analysis  and it is almost identical for all the analyses so we  explain it in this section in general        oC  EB Convert DNA to RNA  88   1  Select DNA sequences MEE LE    2  Result handling    Result handling      Open       Save    ZJE  gt  nen  Figure 9 5  The last step of the analyses exemplified by Translate DNA to RNA                 In this step  shown in figure 9 5  you have
391. lysis shown as annotations     e Overhangs  If there is an overhang  this is displayed with an abbreviated version of the  fragment and its overhangs  The two rows of dots     represent the two strands of the  fragment and the overhang is visualized on each side of the dots with the residue s  that  make up the overhang  If there are only the two rows of dots  it means that there is no  overhang     e Left end  The enzyme that cuts the fragment to the left  5    end    e Right end  The enzyme that cuts the fragment to the right  3    end      e Conflicting enzymes  If more than one enzyme cuts at the same position  or if an enzyme   s  recognition site is cut by another enzyme  a fragment is displayed for each possible  combination of cuts  At the same time  this column will display the enzymes that are in  conflict  If there are conflicting enzymes  they will be colored red to alert the user  If  the same experiment were performed in the lab  conflicting enzymes could lead to wrong  results  For this reason  this functionality is useful to simulate digestions with complex  combinations of restriction enzymes     If views of both the fragment table and the sequence are open  clicking in the fragment table will  select the corresponding region on the sequence     Gel    The restriction map can also be shown as a gel  This is described in section 18 4 1     18 4 Gel electrophoresis    CLC DNA Workbench enables the user to simulate the separation of nucleotide sequences on a  ge
392. m self annealing  Maximum self end annealing  Maximum secondary structure  3 end must meet G C requirements  5 end must meet G C requirements  Primer combination parameters  Max percentage point difference in G C content  Max difference in melting temperatures within a primer pair  Max hydrogen bonds between pairs  Max hydrogen bonds between pair ends  Minimum difference in melting temperature Inner Outer  Fast  8    Accurate  Mispriming parameters  Use mispriming as exclusion criteria      Exact match  Minimum number of base pairs required for a match    Number of consecutive base pairs required in 3 end    Kea       Figure 16 9  Calculation dialog       e Maximal difference in melting temperature of primers in a pair   the number of degrees    Celsius that primers in a pair are all allowed to differ  This criteria is applied to both primer  pairs independently     Maximum pair annealing score   the maximum number of hydrogen bonds allowed between  the forward and the reverse primer in a primer pair  This criteria is applied to all possible  combinations of primers     Minimum difference in the melting temperature of primers in the inner and outer primer  pair   all comparisons between the melting temperature of primers from the two pairs must  be at least this different  otherwise the primer set is excluded  This option is applied  to ensure that the inner and outer PCR reactions can be initiated at different annealing  temperatures  Please note that to ensure flexibility t
393. mat  395  Circular view of sequence  150  377   clc  file format  123  395  CLC Standard Settings  111  CLC Workbenches  2   CLC  file format  393 395  associating with CLC DNA Workbench  13  Clone Manager  file format  393  Cloning  307  377  380  insert fragment  317  Close view  86  Clustal  file format  394  Coding sequence  translate to protein  149  Codon  frequency tables  reverse translation  245  usage  246    col  file format  395  Color residues  354  Comments  160  Common name  batch edit  84  Compare workbenches  3 6  Compatible ends  334  Complexity plot  211  Configure network  33  Conflicting enzymes  339  Conflicts  overview in assembly  303  Consensus sequence  353  3 8    INDEX    open  353  Consensus sequence  extract  300  Conservation  353  graphs  3 8  Contact information  12  Contig  3 76  ambiguities  303  BLAST  301  create  291  reverse complement  297  view and edit  296  Copy  130  annotations in alignments  358  elements in Navigation Area  80  into sequence  149  search results  GenBank  1 0  sequence  162  163  sequence selection  231  text selection  162   cpf  file format  109  chp  file format  395  Create  alignment  348  dot plots  201  enzyme list  343  local BLAST database  18   new folder  80  workspace  94  Create index file  BLAST database  187  CSV  export graph data points  128  formatting of decimal numbers  122   csv  file format  395  CSV  file format  393  395   ct  file format  395  Custom annotation types  157    Dark  color o
394. mbly process  See section 17 7 on how to use the resulting contigs     17 6 Add sequences to an existing contig    This section describes how to assemble sequences to an existing contig  This feature can be  used for example to provide a steady work flow when a number of exons from the same gene are  sequenced one at a time and assembled to a reference sequence     Note that the new sequences will be added to the existing contig which will not be extended  If  the new sequences extend beyond the existing contig  they will be cut off     To start the assembly     select one contig and a number of sequences   Toolbox in the Menu Bar    Sequencing Data Analyses  R    Add Sequences to Contig  E     or right click in the empty white area of the contig   Add Sequences to Contig  E     This opens a dialog where you can alter your choice of sequences which you want to assemble   You can also add sequence lists     When the elements are selected  click Next  and you will see the dialog shown in figure 17 19    The options in this dialog are similar to the options that are available when assembling to a  reference sequence  see section 17 5      Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   This will start the assembly process  See section 17 7 on how to use the resulting contig     Note that the new sequences will be added to the existing contig which will not be extended  If  the new sequences extend beyond the existing contig  t
395. me      The following parameters can be added to the search     e All fields  Text  searches in all parameters in the NCBI database at the same time   e Organism  Text    e Description  Text    e Modified Since  Between 30 days and 10 years    e Gene Location  Genomic DNA RNA  Mitochondrion  or Chloroplast    e Molecule  Genomic DNA RNA  mRNA or rRNA    e Sequence Length  Number for maximum or minimum length of the sequence     e Gene Name  Text     The search parameters are the most recently used  The All fields allows searches in all  parameters in the NCBI database at the same time  All fields also provide an opportu   nity to restrict a search to parameters which are not listed in the dialog  E g  writing  gene Feature key  AND mouse in All fields generates hits in the GenBank database which    CHAPTER 11  ONLINE DATABASE SEARCH 169    contains one or more genes and where    mouse    appears somewhere in GenBank file  You can  also write e g  CD9 NOT homo sapiens in All fields     Note  The    Feature Key    option is only available in GenBank when searching for nucleotide  sequences  For more information about how to use this syntax  see http    www ncbi nlm   ninagovs beoks NBRScs7 7    When you are satisfied with the parameters you have entered  click Start search     Note  When conducting a search  no files are downloaded  Instead  the program produces a list  of links to the files in the NCBI database  This ensures a much faster search     11 1 2 Handling of GenBank 
396. me Specify settings  type  2  Set algorithm parameters    Simple     Positions          PR o     Java regular expression      Sh Jd  Press Shift   F1 for options    Preview       Sequence name ddl 3_ddl1 F    Resulting group 3          Number of sequences     Number of groups 3         Use for grouping Name  ddl          d    3  ddli F          Figure 17 5  Dividing the sequence into three groups based on the number in the middle of the  name        bottom of the dialog  In this example we actually did not need the first and last set of brackets   so the expression could also have been  x    x     in which case only one group would be  listed in the table at the bottom of the dialog     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 283    17 2 2 Process tagged sequences    Multiplexing as described in section 17 2 1 is of course only possible if proper sequence  names could be assigned from the sequencing process  With many of the new high throughput  technologies  this is not possible     However  there is a need for being able to input several different samples to the same sequencing  run  so multiplexing is still relevant   it just has to be based on another way of identifying the  sequences  A method has been proposed to tag the sequences with a unique identifier during  the preparation of the sample for sequencing  Meyer et al   2007      With this technique  each sequence will have a sample specific tag   a special sequence of  nucleotides before and after the seque
397. mer  e Self annealing   the maximum self annealing score of the primer in units of hydrogen bonds    e Self annealing alignment   a visualization of the highest maximum scoring self annealing  alignment    e Self end annealing   the maximum score of consecutive end base pairings allowed between  the ends of two copies of the same molecule in units of hydrogen bonds    e GC content   the fraction of G and C nucleotides in the primer  e Melting temperature of the primer template complex    e Secondary structure score   the score of the optimal secondary DNA structure found for  the primer  Secondary structures are scored by adding the number of hydrogen bonds in  the structure  and 2 extra hydrogen bonds are added for each stacking base pair in the  structure    e Secondary structure   a visualization of the optimal DNA structure found for the primer    If both a forward and a reverse region are selected a table of primer pairs is shown  where  the above columns  excluding the score  are represented twice  once for the forward primer   designated by the letter F  and once for the reverse primer  designated by the letter R      Before these  and following the score of the primer pair  are the following columns pertaining to  primer pair information available     CHAPTER 16  PRIMERS 260    e Pair annealing   the number of hydrogen bonds found in the optimal alignment of the forward  and the reverse primer in a primer pair    e Pair annealing alignment   a visualization of the opt
398. mes    amp 3      1  Select nucleotide Ms  arames    sequences       2  Set parameters  Start Codon  AUG    Any  All start codons in genetic code    5 Other  AUG  CUG  UUG     V  Both strands     Open ended sequence  Genetic code  1 Standard v  Minimum length  codons   100    E  V  Include stop codon in result                   Cos Cems  Cae ie  Xe        Figure 14 8  Create Reading Frame dialog     AUG  Most commonly used start codon   Any  Find all open reading frames   All start codons in genetic code     Other  Here you can specify a number of start codons separated by commas     e Both strands  Finds reading frames on both strands     e Open ended Sequence  Allows the ORF to start or end outside the sequence  If the    sequence studied is a part of a larger sequence  it may be advantageous to allow the ORF  to start or end outside the sequence     e Genetic code translation table     e Include stop codon in result The ORFs will be shown as annotations which can include the  stop codon if this option is checked  The translation tables are occasionally updated from  NCBI  The tables are not available in this printable version of the user manual  Instead  the  tables are included in the Help menu in the Menu Bar  in the appendix      CHAPTER 14  NUCLEOTIDE ANALYSES 236    e Minimum Length  Specifies the minimum length for the ORFs to be found  The length is  specified as number of codons     Using open reading frames for gene finding is a fairly simple approach which is lik
399. minants with an accuracy of  5      e Surface Probability  Display of surface probability based on the algorithm by  Emini et al    1985   This algorithm has been used to identify antigenic determinants on the surface of  proteins     e Chain Flexibility  Display of backbone chain flexibility based on the algorithm by  Karplus  and Schulz  1985   It is known that chain flexibility is an indication of a putative antigenic  determinant     Find  The Find function can also be invoked by pressing Ctrl   Shift   F  96   Shift   F on Mac      The Find function can be used for searching the sequence  Clicking the find button will search for  the first occurrence of the search term  Clicking the find button again will find the next occurrence  and so on  If the search string is found  the corresponding part of the sequence will be selected     e Search term  Enter the text to search for  The search function does not discriminate  between lower and upper case characters     e Sequence search  Search the nucleotides or amino acids  For amino acids  the single  letter abbreviations should be used for searching  The sequence search also has a set of  advanced search parameters         Include negative strand  This will search on the negative strand as well         Treat ambiguous characters as wildcards in search term  If you search for e g  ATN   you will find both ATG and ATC  If you wish to find literally exact matches for ATN  i e   only find ATN   not ATG   this option should not be
400. mines how conserved the sequences must be in order to agree  on a consensus  Here you can also choose IUPAC which will display the ambiguity  code when there are differences between the sequences  E g  an alignment with A  and a G at the same position will display an R in the consensus line if the IUPAC  option is selected   The IUPAC codes can be found in section   and H          No gaps  Checking this option will not show gaps in the consensus         Ambiguous symbol  Select how ambiguities should be displayed in the consensus  line  as N         or     This option has now effect if IUPAC is selected in the Limit list  above     The Consensus Sequence can be opened in a new view  simply by right clicking the  Consensus Sequence and click Open Consensus in New View     e Conservation  Displays the level of conservation at each position in the alignment  The  conservation shows the conservation of all Sequence positions  The height of the bar  or  the gradient of the color reflect how conserved that particular position is in the alignment   If one position is 100  conserved the bar will be shown in full height  and it is colored in  the color specified at the right side of the gradient slider     CHAPTER 19  SEQUENCE ALIGNMENT 354        Foreground color  Colors the letters using a gradient  where the right side color is  used for highly conserved positions and the left side color is used for positions that  are less conserved        Background color  Sets a background color
401. more hits to sequences which  are not truly related     Word size    Change of the word size has a great impact on the seeded sequence space as described above   But one can change the word size to find sequence matches which would otherwise not be found  using the default parameters  For instance the word size can be decreased when searching for  primers or short nucleotides  For blastn a suitable setting would be to decrease the default word  size of 11 to 7  increase the E value significantly  1000  and turn off the complexity filtering     For blastp a similar approach can be used  Decrease the word size to 2  increase the E value  and use a more stringent substitution matrix  e g  a PAM3O0 matrix     Fortunately  the optimal search options for finding short  nearly exact matches can already be  found on the BLAST web pages http    www ncbi nlm nih gov BLAST      Substitution matrix    For protein BLAST searches  a default substitution matrix is provided  If you are looking  at distantly related proteins  you should either choose a high numbered PAM matrix or a  low numbered BLOSUM matrix  See Bioinformatics Explained on scoring matrices on http     www clcbio com be   The default scoring matrix for blastp is BLOSUM62     12 5 6 Explanation of the BLAST output    The BLAST output comes in different flavors  On the NCBI web page the default output is html   and the following description will use the html output as example  Ordinary text and xml output  for easy computation
402. mple graphical overview of the hits found aligned to the  query sequence  The alignments are color coded ranging from black to red as indicated in the color  label at the top     Sequences producing significant alignments   k headers to sort columns                                                  NM 174886 1 Homo sapiens TGFB induced factor  TALE family homeobox   TGIF   339 563 85  1e 90 100  U E GM    NM 173210 1 Homo sa piens TGFB induced factor  TALE family homeobox   TGIF   339 563 85  1e 90 100   MEO  NM 173209 1 Homo sapiens TGFB induced factor  TALE family homeobox   TGIF   339 563 85  1e 90 100  UEGM  NM 173211 1 Homo sa piens TGFB induced factor  TALE family homeobox   TGIF   339 563 85  1e 90 100  maT  NM 173207 1 Homo sapiens TGFB induced factor  TALE family homeobox   TGIF   339 563 85  1e 90 100  UEGM   NM 173208 1 Homo sapiens TGFB induced factor  TALE family homeobox   TGIF   339 563 85  1e 90 100  U E GM    NM 170695 2 Homo sa piens TGFB induced factor  TALE family homeobox   TGIF   339 563 85  1e 90 100  UE GM    NM 003244 2 Homo sapiens TGFB induced factor  TALE family homeobox   TGIF   339 563 85  1e 90 100  UEGM  NM 003246 2 Homo sapiens thrombospondin 1  THBS1   mRNA 38 2 38 2 4  7 2 100  maT   NM 177965 2 Homo sapiens chromosome 8 open reading frame 37  C8orf37   38 2 38 2 4  7 2 100  UEGM   Genomic sequences  show first    NT 010859 14 Homo sapiens chromosome 18 genomic contig  reference assembly 339 602 85  1e 90 100    NW 926940 1 Homo sapiens chrom
403. ms in the right side of the Toolbar apply to the function of the mouse pointer   When e g  Zoom Out is selected  you zoom out each time you click in a view where zooming  is relevant  texts  tables and lists cannot be zoomed   The chosen mode is active until another  mode toolbar item is selected   Fit Width and Zoom to 100  do not apply to the mouse pointer      FE  amp  mM Lo 7  le a    Fit Width 100  Pan SOCATA Zoom In Zoom Out  Figure 3 16  The mode toolbar items        3 3 1 Zoom In  There are four ways of Zooming In     Click Zoom In  5  in the toolbar   click the location in the view that you want to   zoom in on    or Click Zoom In  55  in the toolbar   click and drag a box around a part of the view    the view now zooms in on the part you selected    or Press         on your keyboard  The last option for zooming in is only available if you have a mouse with a scroll wheel   or Press and hold Ctrl  38 on Mac    Move the scroll wheel on your mouse forward    When you choose the Zoom In mode  the mouse pointer changes to a magnifying glass to reflect  the mouse mode     Note  You might have to click in the view before you can use the keyboard or the scroll wheel to  ZOOM     lf you press the Shift button on your keyboard while clicking in a View  the zoom function is  reversed  Hence  clicking on a sequence in this way while the Zoom In mode toolbar item is  selected  zooms out instead of zooming in     3 3 2 Zoom Out  It is possible to zoom out  step by step  on a sequ
404. n    Sall    Sv40 pA            _   _           Sequence details          pcDNA4_TO  5 078bp circular vector     Target vector  from XhoI cut at 105271053 to HindIII cut at 9787979  5 004bp a     Target vector  from HindIII cut at 978979 to XhoI cut at 1052   1053  74bp    war v Target vector defined  j X Define fragments to insert    eS Op ly               BE       TCGAGIC    CAG         56        V  Show   gt  Sequence layout   gt  Annotation layout   gt  Annotation types    Restriction sites   7  Show  Labels  Stacked w  Sorting  Aa TE  hI  V  Non cutters       v       4    ERECT  DARON  Es   S    Single cutters   7  XbaI  1                    4    Double cutters        7  BamHI  2   5   V  EcoRI  2       7  EcoRV  2       HindIII  2       v  XhoI  2                          4    E          tiple cutters  7  Smal  3      7  sali  3       7  Bai  3   5   W Pst 49       Deselect All                        Figure 2 30  Press and hold the Ctrl key while you click first the Hindlll site and next the Xhol site     G   Cloning exper          Sequence 2 of 2  Fragment  ATP8a1 mRNA  ATPS       7  Show as Linear    1 000 2 000 3 000    Atp8a1  pre    Smal   Hindli EcoRI   Pst  ma y y    d   ATP8a1 rev      CGATAAAG    GCTATTTC    GAGTATCG    CTcaTacc    Sequence details     ooo   gt     Vector  pcDNA4_TO   Change to Current     7 le          O pcDNA4_TO  5 078bp circular vector      6  Target vector  from XhoI cut at 10521053 to HindIII cut at 9782979  5 004bp  TTA     AATICGA     T
405. n International Union of Pure and Applied Chemistry     The information is gathered from  http   www iupac org and http   www ebi ac uk   2can tutorials dashtml      Code Description   Adenine   Cytosine   Guanine   Thymine   Uracil   Purine  A or G   Pyrimidine  C  T  or U   CorA   T  U  or G   T  U  or A   CorG   C  T  U  or G  not A    A  T  U  or G  not C    A  T  U  or C  not G    A  C  or G  not T  not U   Any base  A  C  G  T  or U     Zz  lt  TVOWH SE RAKXDWDCAHO  gt     398    Appendix J    Custom codon frequency tables    You can edit the list of codon frequency tables used by CLC DNA Workbench     Note  Please be aware that this process needs to be handled carefully  otherwise you may  have to re install the Workbench to get it to work     In the Workbench installation folder under res  there is a folder named codonfreg  This folder  contains all the codon frequency tables organized into subfolders in a hierarchy  In order to  change the tables  you simply add  delete or rename folders and the files in the folders  If you  wish to add new tables  please use the existing ones as template     Restart the Workbench to have the changes take effect     Please note that when updating the Workbench to a new version  this information is not preserved   This means that you should keep this information in a separate place as back up   The ability to  change the tables is mainly aimed at centrally deployed installations of the Workbench      399    Bibliography     Altschul a
406. n add data by adding a new  location  see section 3 1 1      If a file or another element is dropped on a folder  it is placed at the bottom of the folder  If it is  dropped on another element  it will be placed just below that element     If the element already exists in the Navigation Area  you will be asked whether you wish to create    CHAPTER 3  USER INTERFACE 80    a copy     3 1 2 Create new folders    In order to organize your files  they can be placed in folders  Creating a new folder can be done  in two ways     right click an element in the Navigation Area   New   Folder  H   or File   New   Folder  H0     If a folder is selected in the Navigation Area when adding a new folder  the new folder is added  at the bottom of this folder  If an element is selected  the new folder is added right above that  element     You can move the folder manually by selecting it and dragging it to the desired destination     3 1 3 Sorting folders  You can sort the elements in a folder alphabetically   right click the folder   Sort Folder    On Windows  subfolders will be placed at the top of the folder  and the rest of the elements will  be listed below in alphabetical order  On Mac  both subfolders and other elements are listed  together in alphabetical order     3 1 4 Multiselecting elements  Multiselecting elements means that you select more than one element at the same time  This  can be done in the following ways    e Holding down the  lt Ctrl gt  key     on Mac  while clicking o
407. n be brought to front by clicking its tab     Note  If you right click an open tab of any element  click Show  and then choose a different view  of the same element  this new view is automatically opened in a split view  allowing you to see  both views     See section 3 1 5 for instructions on how to open a view using drag and drop     3 2 2 Show element in another view    Each element can be shown in different ways  A sequence  for example  can be shown as linear   circular  text etc     In the following example  you want to see a sequence in a circular view  If the sequence is  already open in a view  you can change the view to a circular view     Click Show As Circular      at the lower left part of the view    The buttons used for switching views are shown in figure 3 9      Ee AE        Figure 3 9  The buttons shown at the bottom of a view of a nucleotide sequence  You can click the  buttons to change the view to e g  a circular view or a history view     If the sequence is already open in a linear view  at   and you wish to see both a circular and a  linear view  you can split the views very easily     Press Ctrl  38 on Mac  while you   Click Show As Circular      at the lower left part  of the view    This will open a split view with a linear view at the bottom and a circular view at the top  see  10 5      You can also show a circular view of a sequence without opening the sequence first     Select the sequence in the Navigation Area   Show  45      As Circular  Q    
408. n multiple elements selects the    elements that have been clicked     e Selecting one element  and selecting another element while holding down the  lt Shift gt  key  selects all the elements listed between the two locations  the two end locations included      e Selecting one element  and moving the curser with the arrow keys while holding down the   lt Shift gt  key  enables you to increase the number of elements selected     3 1 5 Moving and copying elements    Elements can be moved and copied in several ways     Using Copy  i   Cut       and Paste S from the Edit menu   Using Ctrl   C  38   C on Mac   Ctrl   X      X on Mac  and Ctrl   V      V on Mac      Using Copy   5    Cut     and Paste   j4  in the Toolbar     Using drag and drop to move elements     CHAPTER 3  USER INTERFACE 81    e Using drag and drop while pressing Ctrl   Command to copy elements     In the following  all of these possibilities for moving and copying elements are described in further  detail     Copy  cut and paste functions    Copies of elements and folders can be made with the copy paste function which can be applied  in a number of ways     select the files to copy   right click one of the selected files   Copy  55    right click  the location to insert files into   Paste       or select the files to copy   Ctrl   C  3   C on Mac    select where to insert files   Ctrl    P  3    P on Mac     or select the files to copy   Edit in the Menu Bar   Copy  755    select where to insert  files   Edit
409. n name  accession        Common name         Common name  accession      Annotation Layout and Annotation Types    See section 10 3 1     Restriction sites    See section 10 1 2     CHAPTER 10  VIEWING AND EDITING SEQUENCES 144    Motifs  See section 13 7 1     Residue coloring    These preferences make it possible to color both the residue letter and set a background color  for the residue     e Non standard residues  For nucleotide sequences this will color the residues that are not  C  G  A  T or U  For amino acids only B  Z  and X are colored as non standard residues       Foreground color  Sets the color of the letter  Click the color box to change the color       Background color  Sets the background color of the residues  Click the color box to  change the color     e Rasmol colors  Colors the residues according to the Rasmol color scheme   See http   www openrasmol org doc rasmol html      Foreground color  Sets the color of the letter  Click the color box to change the color       Background color  Sets the background color of the residues  Click the color box to  change the color     e Polarity colors  only protein   Colors the residues according to the polarity of amino acids         Foreground color  Sets the color of the letter  Click the color box to change the color       Background color  Sets the background color of the residues  Click the color box to  change the color     e Trace colors  only DNA   Colors the residues according to the color conventions of  
410. n of se   quences in alignments  363  Personal information  28  Pfam domain search  378   phr  file format  395  PHR  file format  395  Phred  file format  393   phy  file format  395  Phylip  file format  394  Phylogenetic tree  366  3 9  tutorial  71  Phylogenetics  Bioinformatics explained  3 1  pir  file format  395  PIR  NBRF   file format  393  Plot  dot plot  201  local complexity  211  Plug ins  30   png format  export  126  Polarity colors  144  Portrait  Print orientation  115  Positively charged residues  217  PostScript  export  126  Preference group  110  Preferences  104  advanced  109  Data  108  export  109  General  104  import  109  style sheet  110  toolbar  106  View  106  view  90  Primer  2 1  analyze  209  based on alignments  205  Buffer properties  252  design  3 9  design from alignments  3 9  display graphically  254    INDEX    length  252  mode  253  nested PCR  253  order  2 5  sequencing  253  Standard  253  TaqMan  253  tutorial  57  Primers  find binding sites  2 1  Print  113  dot plots  203  preview  116  visible area  114  whole view  114   pro  file format  395  Problems when starting up  29  Processes  93  Properties  batch edit  84  Protein  charge  237  378  hydrophobicity  241  Isoelectric point  215  report  3    statistics  215  translation  243  Proteolytic cleavage  3 8  Proxy server  33   ps format  export  126    psi  file format  395  PubMed references  search  1 1  PubMed references search  377    Quality of chromatogram trace 
411. n the contig  move the vertical slider at position 2073 to the left  see figure 17 21         YO TC CACGTCGGTACAGAACAGGCTGC    Trace data    You will now see how the gaps in the consensus sequence are replaced by real sequence  information     Note that you can only move the sliders when you are zoomed in to see the sequence residues     2 5 6 Inspecting the traces  Clicking the Find Conflict button again will find the next conflict     Here both reads are different than the reference sequence  We now inspect the traces in more  detail  In order to see the details  we zoom in on this position     Zoom in in the Tool Bar  5    Click the selected base   Click again three times    Now you have zoomed in on the trace  see figure 2 18      CHAPTER 2  TUTORIALS 49    T C A T C A    DADALA DAMNA    EN  g      PED NA ala ANAS    Figure 2 18  Now you can see all the details of the traces     This gives more space between the residues  but if we would like to inspect the peaks even  more  simply drag the peaks up and down with your mouse  see figure 17 2      T c A c G c    T T G Cc Cc A T   N NA trace data by dragging up and down Wass    Figure 2 19  Grab the traces to scale        2 5 7 Synonymous substitutions     In this case we have sequenced the coding part of a gene  Often you want to know what a  variation like this would mean on the protein level  To do this  show the translation along the  contig     Nucleotide info in the Side Panel   Translation   Show   Select ORF CDS in t
412. n this computer    License Order ID   CLC LICENSE SRENMNSTED 0D43CASEDF4X0XxOXxxD844 AF COC4BOOKXK       Direct Download    The workbench will attempt to contact the CLC Licenses Service  and download the license  directly   This method requires internet access from the workbench     Go to License Download web page    Th rkbench will open a We o owser with the License Download web page  From there you  will be able to download your license as a file np in the next step     If you experience any problems  please contact The CLC Support Team      Proxy Settings    Previous   Previous    Next    Quit Workbench Quit Workbench    Figure 1 7  Entering a license ID provided by CLC bio  the license ID in this example is artificial               In this dialog  there are two options   e Direct download  The workbench will attempt to contact the online CLC Licenses Service   and download the license directly  This method requires internet access from the workbench     e Go to license download web page  The workbench will open a Web Browser with the  License Download web page when you click Next  From there you will be able to download  your license as a file and import it  This option allows you to get a license  even though the  Workbench does not have direct access to the CLC Licenses Service     If you select the first option  and it turns out that you do not have internet access from the  Workbench  because of a firewall  proxy server etc    you will be able to click Previous and u
413. name to display in the Workbench  Restart the Workbench  and the new  database will be visible in the BLAST dialog     Appendix E    Restriction enzymes database  configuration    CLC DNA Workbench uses enzymes from the REBASE restriction enzyme database at http     cebase neb com  If you wish to add enzymes to this list  you can do this by manually using  the procedure described here     Note  Please be aware that this process needs to be handled carefully  otherwise you may  have to re install the Workbench to get it to work     First  download the following file  http    www clcbio com wbsettings link emboss    e custom  In the Workbench installation folder under settings  create a folder named rebase  and place the extracted link emboss e custom file here  Open the file in a text editor  The  top of the file contains information about the format  and at the bottom there are two example  enzymes that you should replace with your own     Restart the Workbench to have the changes take effect     389    Appendix F    Technical information about modifying  Gateway cloning sites    The CLC DNA Workbench comes with a pre defined list of Gateway recombination sites  These  sites and the recombination logics can be modified by downloading and editing a properties file   Note that this is a technical procedure only needed if the built in functionality is not sufficient for  your needs     The properties file can be downloaded from http   www  clcbio com wbsettings gatewaycloning   
414. nce of interest  This principle is shown in figure 17 6   please refer to  Meyer et al   2007  for more detailed information      A  mmm as   i yu  III  o             A  T    C    NNNNG           GTCGATGCCCGGGCATCGAC  a          V Srfl specific complemented  A site tag     arget sequence tag  Srfl site  Sample 1 e a       VI GGGCATCGAC GTCGATGCCC    Sample 2  GGGCTCGCTG       Figure 17 6  Tagging the target sequence  Figure from  Meyer et al   2007      The sample specific tag   also called the barcode   can then be used to distinguish between the  different samples when analyzing the sequence data  This post processing of the sequencing  data has been made easy by the multiplexing functionality of the CLC DNA Workbench which  simply divides the data into separate groups prior to analysis  Note that there is also an example  using Illumina data at the end of this section     The first step is to separate the imported sequence list into sublists based on the barcode of the  sequences     Toolbox   High throughput Sequencing  f    Multiplexing  H3    Process Tagged  Sequences  m       This opens a dialog where you can add the sequences you wish to sort  You can also add  sequence lists     When you click Next  you will be able to specify the details of how the de multiplexing should be  performed  At the bottom of the dialog  there are three buttons which are used to Add  Edit and  Delete the elements that describe how the barcode is embedded in the sequences     First  click A
415. nd Gish  1996  Altschul  S  F  and Gish  W   1996   Local alignment statistics   Methods Enzymol  266 460 480      Altschul et al   1990  Altschul  S  F   Gish  W   Miller  W   Myers  E  W   and Lipman  D  J    1990   Basic local alignment search tool  J Mol Biol  215 3  403 410      Andrade et al   1998  Andrade  M  A   O Donoghue  S  l   and Rost  B   1998   Adaptation of  protein surfaces to subcellular location  J Mol Biol  276 2  51  7 525      Bachmair et al   1986  Bachmair  A   Finley  D   and Varshavsky  A   1986   In vivo half life of a  protein is a function of its amino terminal residue  Science  234 4 773  1 79 186      Bommarito et al   2000  Bommarito  S   Peyret  N   and SantaLucia  J   2000   Thermodynamic  parameters for DNA sequences with dangling ends  Nucleic Acids Res  28 9  1929 1934      Clote et al   2005  Clote  P   Ferr    F   Kranakis  E   and Krizanc  D   2005   Structural RNA has  lower folding energy than random RNA of the same dinucleotide frequency  RNA  11 5  5 8   591      Cornette et al   1987  Cornette  J  L   Cease  K  B   Margalit  H   Spouge  J  L   Berzofsky  J  A    and DeLisi  C   1987   Hydrophobicity scales and computational techniques for detecting  amphipathic structures in proteins  J Mol Biol  195 3  659 685      Cronn et al   2008  Cronn  R   Liston  A   Parks  M   Gernandt  D  S   Shen  R   and Mockler   T   2008   Multiplex sequencing of plant chloroplast genomes using solexa sequencing by   synthesis technology  Nucleic Aci
416. nding parameters  This opens the dialog displayed in figure 16 17     At the top  select one or more primers by clicking the browse  gry  button  In CLC DNA Workbench     CHAPTER 16  PRIMERS 2 2    Find Binding Sites and Create Fragments  1  Select nucleotide mem pessimo estos   sequence s  to match  primer against    2  Set Primer properties  Primer    Select primer s  to match against sequence s      Match criteria      Exact match    Minimum number of base pairs required for a match    1    Number of consecutive base pairs required in 3 end   10        Concentrations    Primer concentration  nM  2004      Salt concentration  rn    100           Figure 16 17  Search parameters for finding primer binding sites        primers are just DNA sequences like any other  but there is a filter on the length of the sequence   Only sequences up to 400 bp can be added     The Match criteria for matching a primer to a sequence are     e Exact match  Choose only to consider exact matches of the primer  i e  all positions must  base pair with the template     e Minimum number of base pairs required for a match  How many nucleotides of the primer  that must base pair to the sequence in order to cause priming mispriming     e Number of consecutive base pairs required in 3    end  How many consecutive 3    end base  pairs in the primer that MUST be present for priming mispriming to occur  This option is  included since 3    terminal base pairs are known to be essential for priming to occur   
417. ndling Output options    Create primer list    Result handling      Open      Save    Log handling      Make log      15  ETT EI       Figure 18 20  Besides the main output which is a copy of the the input sequence s  now including  attB sites and primer additions  you can get a list of primers as output     ace peDNAS atpSal    O    20 49 8 ao Sequence settings    9    l l I l    r    WS    Forward primer Shine D algamo   Annotation types  attB1 Atp  a 1 EM cos     GC Exon        GGGGACAAGTTTGTACAAAAAAGCAGGCT TAAGGAGG TATGCCGACCATG CGGAGGAC AGT GTCGGAGAT CCGC TCGCG CG CGGAAGGTTATGAI    C  Gene           gt   aii    EE  4  Misc  recombination C  Ea Primer        Mires         C  Source    a  EH O sT oO    su  EA   Deselect All   a  HOoOBEZ RADL       Figure 18 21  the attB site plus the Shine Dalgarno primer addition is annotated     Extending the pre defined list of primer additions    The list of primer additions shown when pressing Shift F1 in the dialog shown in figure 18 16  can be configured and extended  If there is a tag that you use a lot  you can add it to the list for  convenient and easy access later on  This is done in the Preferences     Edit   Preferences   Advanced    In the advanced preferences dialog  scroll to the part called Gateway cloning primer additions   see figure 18 22      Each element in the list has the following information     Name The name of the sequence  When the sequence fragment is extended with a primer  addition  an annotation will be ad
418. nel     If you wish to use all the enzymes in the list   Click in the panel to the left   press Ctrl   A  38   A on Mac    Add    gt      The enzymes can be sorted by clicking the column headings  i e  Name  Overhang  Methylation  or Popularity  This is particularly useful if you wish to use enzymes which produce e g  a 3     overhang  In this case  you can sort the list by clicking the Overhang column heading  and all the  enzymes producing 3    overhangs will be listed together for easy selection     When looking for a specific enzyme  it is easier to use the Filter  If you wish to find e g  Hindlll  sites  simply type Hindlll into the filter  and the list of enzymes will shrink automatically to only  include the Hindlll enzyme  This can also be used to only show enzymes producing e g  a 3     overhang as shown in figure 18 51     The CLC DNA Workbench comes with a standard set of enzymes based on http   www  rebase neb com  You  can customize the enzyme database for your installation  see section E    CHAPTER 18  CLONING AND CUTTING 331    Restriction Site Analysis      Select DNA RNA  _   Enzymes to be considered in calculi  sequence s  Enzyme list      Enzymes to be considered fogs Sa       v  Use existing enzyme list   Popular enzymes v  in calculation anz EE Y      Enzymes in  Popular en     Enzymes to be used  Filter        Filter     Name Overhang Methylat    Popul      Name Overhang Methyla    Pop         PstI   tgca S   N6 met    te  KpnI     gtac S   N   met      
419. netic inference  The bottom shows a tree found by neighbor    joining  while the top shows a tree found by UPGMA  The latter method assumes that the evolution  occurs at a constant rate in different lineages     Maximum likelihood phylogeny       E  88 Maximum Likelihood Phylogeny  28     1  Select one nucleotide   Setparancers  eee Set starting tree  2  Set parameters      Neighbor Joining  UPGMA    Use tree from file    Select substitution model  Jukes Cantor v    Transition   transversion ratio  2    Rate variation     Include rate variation  Number of substitution rate categories 4    Gamma distribution parameter 1    Estimation     V  Estimate substitution rate parameter s        Estimate topology  Estimate Gamma distribution parameter    faman   a      erevious J  Que    Yeinsh    Xena    Figure 20 4  Adjusting parameters for ML phylogeny                   Figure 20 4 shows the parameters that can be set for the ML phylogenetic tree reconstruction     e Starting tree  the user is asked to specify a starting tree for the tree reconstruction  There  are three possibilities        Neighbor joining      UPGMA    CHAPTER 20  PHYLOGENETIC TREES 369        Use tree from file     e Select substitution model  CLC DNA Workbench allows maximum likelihood tree estimation  to be performed under the assumption of one of four substitution models  the Jukes  Cantor  Jukes and Cantor  1969   the Kimura 80  Kimura  1980   the HKY  Hasegawa  et al   1985  and the GTR  also known as the RE
420. nformatics explained  355  new  162  region types  149  search  14   select  148  shuffle  199  Statistics  212  view  141  view as text  161  view circular  150  view format  82  web info  1 0  Sequence logo  354  Sequencing data  3 6  Sequencing primers  3 9  Share data  78  3 6  Share Side Panel Settings  107  Shared BLAST database  186  Shortcuts  95  Show  enzymes cutting selection  331  results from a finished process  93  Show dialogs  105  Show enzymes with compatible ends  334    INDEX    Show hide Toolbox  94  Shuffle sequence  199  377  Side Panel   tutorial  41  Side Panel Settings   export  107   import  107   Share with others  107  Side Panel  location of  106  Signal peptide  378  Single base editing   in contig  299   in sequences  149  Single cutters  329  SNP detection  3 76  Solexa  see Illumina Genome Analyzer  SOLID data  3 0  Sort   sequences alphabetically  358   sequences by similarity  358  Sort sequences by name  2 9  Sort  folders  80  Source element  132  Species  display name  82  Staden  file format  393  Standard layout  trees  3 0  Standard Settings  CLC  111  Star activity  343  Start Codon  234  Start up problems  29  Statistics   about sequence  3     protein  215   sequence  212  Status Bar  93  94   illustration       Str  file format  395  Structure scanning  3 9  Style sheet  preferences  110  Subcontig  extract part of a contig  301  Support mail  12  Surface probability  146   svg format  export  126  Swiss Prot  file format  393  Swi
421. ng     Primers     Protein analyses  4 7 Protein orthologs     RNA secondary st   gt  OG    Xx ATP8al mRNA          5  Sequencing data ce           4  al p    Qe zenter search term gt    4        aie Xena                   gt        Figure 14 2  Translating RNA to DNA     If a sequence was selected before choosing the Toolbox action  this sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     This will open a new view in the View Area displaying the new DNA sequence  The new sequence  is not saved automatically  To save the sequence  drag it into the Navigation Area or press Ctrl    CHAPTER 14  NUCLEOTIDE ANALYSES 231      S     S on Mac  to activate a save dialog     Note  You can select multiple RNA sequences and sequence lists at a time  If the sequence list  contains DNA sequences as well  they will not be converted     14 3 Reverse complements of sequences    CLC DNA Workbench is able to create the reverse complement of a nucleotide sequence  By  doing that  a new sequence is created which also has all the annotations reversed since they  now occupy the opposite strand of their previous location     To quickly obtain the reverse complement of a sequence or part of a sequence  you may select  a region on the negative strand and open it in a new view     right click
422. ng alignments of nucleotide or peptide sequences  the software offers several ways to  view alignments  The alignments can then be used for building phylogenetic trees     Sequences must be available via the Navigation Area to be included in an alignment  If you have  sequences open in a View that you have not saved  then you just need to select the view tab and  press Ctrl   S  or     S on Mac  to save them     In this tutorial six protein sequences from the Example data folder will be aligned   See  figure 2 51         Example data  Xx ATPS8al genomic sequence  XxX ATPSal mRNA  sys ATP8al    Figure 2 51  Six protein sequences in    Sequences    from the    Protein orthologs    folder of the Example  data   To align the sequences     select the sequences from the    Protein    folder under    Sequences      Toolbox    Alignments and Trees        Create Alignment  iE     2 10 1 The alignment dialog  This opens the dialog shown in figure 2 52        q Create Alignment Xu        1  Select sequences of same   Bad eiilbsicibeiesiMlli9 1  Projects  Selected Elements   6     094296  P39524  P57792  Q29449  QONTI2  Q95x33            Elta  CLC Data     gt  Example Data  XxX ATP8al genomit  XxX ATP8al mRNA  ht ATP8al       feces        Protein analyse   5 Protein ortholog  SEE ATPSal orth       222222       RNA secondary     Sequencing dat     j       Qy   lt enter search term gt                              Figure 2 52  The alignment dialog displaying the six protein sequences     CH
423. ng temperature algorithm employed includes  the latest thermodynamic parameters for calculating Zm when single base mismatches  occur     When in Standard PCR mode  clicking the Calculate button will prompt the dialog shown in  figure 16 13     The top part of this dialog shows the single primer parameter settings chosen in the Primer  parameters preference group which will be used by the design algorithm     The central part of the dialog contains parameters pertaining to primer specificity  this is omitted  if all sequences belong to the included group   Here  three parameters can be set     e Minimum number of mismatches   the minimum number of mismatches that a primer must  have against all sequences in the excluded group to ensure that it does not prime these     e Minimum number of mismatches in 3    end   the minimum number of mismatches that a  primer must have in its 3  end against all sequences in the excluded group to ensure that  it does not prime these     e Length of 3    end   the number of consecutive nucleotides to consider for mismatches in the  3    end of the primer     The lower part of the dialog contains parameters pertaining to primer pairs  this is omitted when  only designing a single primer   Here  three parameters can be set     e Maximum percentage point difference in G C content   if this is set at e g  5 points a pair  of primers with 45  and 49  G C nucleotides  respectively  will be allowed  whereas a pair  of primers with 45  and 51  G C nucl
424. ng the dot plot     The Side Panel to the right let you specify the dot plot preferences  The gradient color box can  be adjusted to get the appropriate result by dragging the small pointers at the top of the box   Moving the slider from the right to the left lowers the thresholds which can be directly seen in  the dot plot  where more diagonal lines will emerge  You can also choose another color gradient  by clicking on the gradient box and choose from the list     Adjusting the sliders above the gradient box is also practical  when producing an output for  printing   Too much background color might not be desirable   By crossing one slider over the  other  the two sliders change side  the colors are inverted  allowing for a white background   If  you choose a color gradient  which includes white   Se figure 13 5     CHAPTER 13  GENERAL SEQUENCE ANALYSES 204    PRECES va PERDEI    140       130        120      1104    Sequence 2                                             10 a 30 am 50 ED TD Em Bo 100 110 im 130 140  Sequence    Figure 13 6  Dot plot with inverted colors  practical for printing     13 2 3 Bioinformatics explained  Dot plots  Realization of dot plots    Dot plots are two dimensional plots where the x axis and y axis each represents a sequence  and the plot itself shows a comparison of these two sequences by a calculated score for each  position of the sequence  If a window of fixed size on one sequence  one axis  match to the other  sequence a dot is drawn at
425. ngs  open CLC DNA Workbench  and go to the Advanced tab of the  Preferences dialog  figure 1 28  and enter the appropriate information  The Preferences dialog  is opened from the Edit menu     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 34       Use HTTP Proxy Server    O  u    HTTP Proxy  Port     f       HTTP Proxy Requires Login    Passwor d     Use SOCKS Proxy Server    SOCKS Host  Port     be ILE    You may have to restart the application For these changes to take effect                 Default Data Location  Default Data Location  CLC Data w        m CSI BLAST  URL to use when blasting  http   blast ncbi nim nih gov Blast  cgi  Maximum number of simultaneous requests  10    Delay  in ms  between requests  3000            X Cancel       Help     Export     Import                  Figure 1 28  Adjusting proxy preferences     You have the choice between a HTTP proxy and a SOCKS proxy  CLC DNA Workbench only supports  the use of a SOCKS proxy that does not require authorization     Exclude hosts can be used if there are some hosts that should be contacted directly and not  through the proxy server  The value can be a list of hosts  each separated by a    and in addition  a wildcard character    can be used for matching  For example     f00 com  localhost     If you have any problems with these settings you should contact your systems administrator     1 9 The format of the user manual    This user manual offers support to Windows  Mac OS X and Linux users  The software is 
426. nsus MVHLTXEEKN AVTGLWGKVN VDEVGGEALG       Auto wrap                      Fixed wrap     Sequence log asi avraLihek EE a    Numbers on sequences    Relative to   1  P68046 MEENE ET   REEDSEGcDES sPDANMGNPR 59 Cia ei  p6s053 REENNMPWTO REBDSEcCDBs sPBalmcnPk 59  p6s225 REGUURPWTQ RERESEcBEs sPBamcnPkK co numb  P68873 REBWMNPWTQ RERESEGDES TPBANMGNPK 60  Hd label  p6s226 REBMUMPWTR RERESEGDES TABANMNNPK 60    Falta mio  at ot     v             1 elements  are selected    Figure 3 15  A maximized view  The function hides the Navigation Area and the Toolbox     3 2 7 Side Panel    The Side Panel allows you to change the way the contents of a view are displayed  The options  in the Side Panel depend on the kind of data in the view  and they are described in the relevant  sections about sequences  alignments  trees etc     Side Panel are activated in this way   select the view   Ctrl   U  36   U on Mac   or right click the tab of the view   View   Show Hide Side Panel  15     Note  Changes made to the Side Panel will not be saved when you save the view  See how to  save the changes in the Side Panel in chapter 5      The Side Panel consists of a number of groups of preferences  depending on the kind of data    CHAPTER 3  USER INTERFACE 91    being viewed   which can be expanded and collapsed by clicking the header of the group  You  can also expand or collapse all the groups by clicking the icons            at the top     3 3 Zoom and selection in View Area    The mode toolbar ite
427. ntation  original       Drag handles to adjust sequence overhangs          pcDNA4_TO pcDNAS atp8ai NotI Apal pcDNA4 TO   l   b d   l   5 3  vo ES GECECCL   AGECECEC Mello pr   z    e CCGCCG OGAT o BEE CCGGGCA           b  v    Summary        Vector sequence pcDNA4 TO   Positive strand 3   no change  Positive strand 5   no change  Negative strand 3   no change    a    Negative strand 5   no change   Insert sequence pcDNA3 atp8a1_ NotI Apal  orientation  original   o a            Figure 18 12  Drag the handles to adjust overhangs     At the top is a button to reverse complement the inserted sequence     Below is a visualization of the insertion details  The inserted sequence is at the middle shown in  red  and the vector has been split at the insertion point and the ends are shown at each side of  the inserted sequence     CHAPTER 18  CLONING AND CUTTING 318    If the overhangs of the sequence and the vector do not match  you can blunt end or fill in the  overhangs using the drag handles         Whenever you drag the handles  the status of the insertion point is indicated below     e The overhangs match  f      e The overhangs do not match  2   In this case  you will not be able to click Finish  Drag  the handles to make the overhangs match     At the bottom of the dialog is a summary field which records all the changes made to the  overhangs  This contents of the summary will also be written in the history  Li  when you click  Finish     When you click Finish and the sequence 
428. nter concerning the  properties of primer probe pairs or sets e g  primer pair annealing and Tm difference between  primers  If the latter is desired the user can use the Calculate button at the bottom of the Primer  parameter preference group  This will activate a dialog  the contents of which depends on the  chosen mode  Here  the user can set primer pair specific setting such as allowed or desired Tm    CHAPTER 16  PRIMERS 251    difference and view the single primer parameters which were chosen in the Primer parameters  preference group     Upon pressing finish  an algorithm will generate all possible primer sets and rank these based  on their characteristics and the chosen parameters  A list will appear displaying the 100 most  high scoring sets and information pertaining to these  The search result can be saved to the  navigator  From the result table  suggested primers or primer probe sets can be explored since  clicking an entry in the table will highlight the associated primers and probes on the sequence   It is also possible to save individual primers or sets from the table through the mouse right click  menu  For a given primer pair  the amplified PCR fragment can also be opened or saved using  the mouse right click menu     16 1 2 Scoring primers    CLC DNA Workbench employs a proprietary algorithm to rank primer and probe solutions  The  algorithm considers both the parameters pertaining to single oligos  such as e g  the secondary  structure score and parameters
429. nts in the right hand side of the dialog  To run the contents of the Cloning folder in batch     double click to select it     When the Cloning folder is selected and you click Next  a batch overview is shown     9 1 1 Batch overview    The batch overview lists the batch units to the left and the contents of the selected unit to the    right  see figure 9 3      CHAPTER 9  BATCHING AND RESULT HANDLING 135          dq Find Binding Sites and Create Fragments n 28               a  1  Choose where to run BBSSIS nasais  Units Contents   een Miamp  puce   sequence s  to match      primer against pcDNAS atp8al mS  M1i3mp9   pUC9  pcDNA4_TO pACYC177  3  Batch overview Processed data pACYC184  pAM34  p  Tis3  pATH1  pATH10  pATHI1  pATH2  p  TH3  pBLCATZ  pBLCAT3  pBLCATS  pBLCAT6  pBR322  pBR325  pBR327    x Only use elements containing   Exclude elements containing   90 elements in total  Finist      Previous     gt  Next   cancel      X          JRRERRR RRR RR RRR RRR                                       Figure 9 3  Overview of the batch run     In this example  the two sequences are defined as separate batch units because they are located  at the top level of the Cloning folder  There were also three folders in the Cloning folder  see  figure 9 2   and two of them are listed as well  This means that the contents of these folders are  pooled in one batch run  you can see the contents of the Cloning vector library batch  run in the panel at the right hand side of the dialog   The r
430. nvitrogen Corporation    CHAPTER 18  CLONING AND CUTTING 319             EI Insert restriction site before selection o a     1  Please choose enzymes a cera lites Demat pdoe   Enzyme list    Use existing enzyme list    All enzymes  Filter  HindIII         Name Methylation      HindIII 5   N6 methyladenosine       Sequence to be inserted  5  additional HindIII 3 additional  cgath AAGCTT                            Figure 18 14  Inserting the Hindlll recognition sequence     e First  attB sites are added to a sequence fragment    e Second  the attB flanked fragment is recombined into a donor vector  the BP reaction  to  construct an entry clone    e Finally  the target fragment from the entry clone is recombined into an expression vector   the LR reaction  to construct an expression clone  For Multi site gateway cloning  multiple  entry clones can be created that can recombine in the LR reaction     During this process  both the attB flanked fragment and the entry clone can be saved     For more information about the Gateway technology  please visit http   www  invitrogen com   site us en home Products and Services Applications Cloning Gateway Cloning html    To perform these analyses in the CLC DNA Workbench  you need to import donor and expression  vectors  These can be downloaded from Invitrogen   s web site and directly imported into the  Workbench  http   tools invitrogen com downloads Gateway 20vectors ma4    18 2 1 Add attB sites    The first step in the Gateway cloning p
431. o Se A    we 238  15 2 Hydrophobicity sacarose EA RAE Oe E Oe 239  15 2 1 Hydrophobicity plot gaa bee c ae ew eres dd ASS EGE HOS 239  15 2 2 Hydrophobicity graphs along sequence    n noos osom oa 286 239  15 2 3 Bioinformatics explained  Protein hydrophobicity                241  15 3 Reverse translation from protein into DNA        0 0088 ee eee 243  15 3 1 Reverse translation parameters    aoao oaoa ee a 244  15 3 2 Bioinformatics explained  Reverse translation                 245    CLC DNA Workbench offers analyses of proteins as described in this chapter     15 1 Protein charge    In CLC DNA Workbench you can create a graph in the electric charge of a protein as a function  of pH  This is particularly useful for finding the net charge of the protein at a given pH  This  knowledge can be used e g  in relation to isoelectric focusing on the first dimension of 2D gel  electrophoresis  The isoelectric point  pl  is found where the net charge of the protein is  zero  The calculation of the protein charge does not include knowledge about any potential  post translational modifications the protein may have     The pKa values reported in the literature may differ slightly  thus resulting in different looking  graphs of the protein charge plot compared to other programs     In order to calculate the protein charge     Select a protein sequence   Toolbox in the Menu Bar   Protein Analyses  la     Create Protein Charge Plot               or right click a protein sequence   Toolbox
432. o Sea project   This does overlap with nucleotide nr     D 3 Adding more databases    Besides the databases that are part of the default configuration  you can add more databases  located at NCBI by configuring files in the Workbench installation directory     The list of databases that can be added is here  http   www ncbi nlm nih gov staff   tao URLAPI remote blastdblist html     In order to add a new database  find the settings folder in the Workbench installation directory   e g  C  Program files CLC Genomics Workbench 4   Download unzip and place the  following files in this directory to replace the built in list of databases     e Nucleotide databases  http    www clcbio com wbsettings NCBI BlastNucleotideDataba  Zip  e Protein databases  http   www clcbio com wbsettings NCBI_BlastProteinDatabases     zip    Open the file you have downloaded into thesettings folder  e g  NCBI_BlastProteinDatabases proper  in a text editor and you will see the contents look like this     APPENDIX D  BLAST DATABASES 388    nr  clcdefault    Non redundant protein sequences  refseq_protein   Reference proteins   swissprot   Swiss Prot protein sequences   pat   Patented protein sequences   pdb   Protein Data Bank proteins   env_nr   Environmental samples   month   New or revised GenBank sequences    Simply add another database as a new line with the first item being the database name taken from  http   www ncbi nlm nih gov staff tao URLAPI remote blastdblist html and  the second part is the 
433. o contig  295  Adjust selection  148  Adjust trim  296  Advanced preferences  109  Advanced search  101  Algorithm  alignment  347  neighbor joining  3 3  UPGMA  372  Align  alignments  350  protein sequences  tutorial  69  sequences  3 8  Alignment  see Alignments  Alignment Primers  Degenerate primers  266  267  PCR primers  266  Primers with mismatches  266  267  Primers with perfect match  266  267  TaqMan Probes  266  Alignment based primer design  265  Alignments  347  378  add sequences to  359  compare  361  create  348  design primers for  265    edit  357   fast algorithm  349   join  359   multiple  Bioinformatics explained  364   remove sequences from  358   view  353   view annotations on  153  Aliphatic index  215   aln  file format  395  Alphabetical sorting of folders  80  Ambiguities  reverse translation  246  Amino acid composition  217  Amino acids   abbreviations  396   UIPAC codes  396  Analyze primer properties  269  Annotation   select  148  Annotation Layout  in Side Panel  153  Annotation types   define your own  157  Annotation Types  in Side Panel  153  Annotations   add  156   copy to other sequences  358   edit  156  158   in alignments  358   introduction to  152   links  1 2   overview of  155   show hide  153   table of  155   trim  288   types of  153   view on sequence  153   viewing  152  Annotations  add links to  158  Antigenicity  378  Append wildcard  search  168  Arrange   layout of sequence  39    405    INDEX    views in View Area  88
434. o demon   strates our software  This is a very easy way to get started using the program  Read more about  online presentations here  http   clcbio com presentation     1 6 1 Quick start    When the program opens for the first time  the background of the workspace is visible  In the  background are three quick start shortcuts  which will help you getting started  These can be  seen in figure 1 23     Figure 1 23  Three available Quick start short cuts  available in the background of the workspace     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 30    The function of the three quick start shortcuts is explained here     e Import data  Opens the Import dialog  which you let you browse for  and import data from  your file system     e New sequence  Opens a dialog which allows you to enter your own sequence     e Read tutorials  Opens the tutorials menu with a number of tutorials  These are also  available from the Help menu in the Menu bar     1 6 2 Import of example data    It might be easier to understand the logic of the program by trying to do simple operations on  existing data  Therefore CLC DNA Workbench includes an example data set     When downloading CLC DNA Workbench you are asked if you would like to import the example  data set  If you accept  the data is downloaded automatically and saved in the program  If you  didn   t download the data  or for some other reason need to download the data again  you have  two options     You can click Install Example Data  c  in t
435. o find annotations or a subset of annotations     e You can copy and paste annotations  e g  from one sequence to another     e  f you wish to edit many annotations consecutively  the double click editing makes this very  fast  see section 10 3 2      10 3 2 Adding annotations  Adding annotations to a sequence can be done in two ways     open the sequence in a sequence view  double click in the Navigation Area    make  a selection covering the part of the sequence you want to annotate    right click  the selection   Add Annotation         CHAPTER 10  VIEWING AND EDITING SEQUENCES 157    or select the sequence in the Navigation Area   Show  45     Annotations   E     Add    Annotation         This will display a dialog like the one in figure 10 10        E  c Add annotation       Annotation types       tj Protein Features     Protein Functional Features     Protein Sequence Features     DNA RNA Sequence Features       1   Alignment fixpoint                Properties  Name Test  Type Misc  feature a    Region 10  16    Annotation notes       Add qualifier key                da    X   Cancel     Help            Figure 10 10  The Add Annotation dialog     The left hand part of the dialog lists a number of Annotation types  When you have selected an  annotation type  it appears in Type to the right  You can also select an annotation directly in this  list  Choosing an annotation type is mandatory  If you wish to use an annotation type which is  not present in the list  simply ente
436. o the clipboard  which will enable  it for use in other programs     e Duplicate Selection  If a selection on the sequence is duplicated  the selected region will  be added as a new sequence to the cloning editor with a new sequence name representing  the length of the fragment  When a sequence region between two restriction sites are  double clicked the entire region will automatically be selected  This makes it very easy  to make a new sequence from a fragment created by cutting with two restriction sites   right click the selection and choose Duplicate selection      e Open Selection in New View  Log   This will open the selected region in the normal  sequence view     e Edit Selection  1 8   This will open a dialog box  in which is it possible to edit the selected  residues     e Delete Selection      This will delete the selected region of the sequence   e Add Annotation      This will open the Add annotation dialog box     e Show Enzymes Only Cutting Selection  4   This will add enzymes cutting this selection to  the Side Panel     e Insert Restriction Sites before after Selection  This will show a dialog where you can  choose from a list restriction enzymes  see section 18 1 4      CHAPTER 18  CLONING AND CUTTING 317    Insert one sequence into another    Sequences can be inserted into each other in several ways as described in the lists above  When  you chose to insert one sequence into another you will be presented with a dialog where all  sequences in the view are p
437. obes based on an  alignment of multiple sequences     The primer designer for alignments can be accessed in two ways   select alignment   Toolbox   Primers and Probes  71    Design Primers  1x    OK    or If the alignment is already open    Click Primer Designer       at the lower left part  of the view    In the alignment primer view  see figure 16 12   the basic options for viewing the template  alignment are the same as for the standard view of alignments  See section 19 for an  explanation of these options     Note  This means that annotations such as e g  known SNP   s or exons can be displayed on  the template sequence to guide the choice of primer regions  Since the definition of groups of  sequences is essential to the primer design the selection boxes of the standard view are shown  as default in the alignment primer view         small nucleot          20 P Primer Designer settings  x  i l mm    PERH2BD O GTGAGTCTGA TGGGTCTGCC CATGGTTTTC TTCCTCTAGT    Ss LSS  PERH3BC O GTGAGTCTGA TGGGTCTGCC CATGGTTTCC TTCCTCTAGT  Primer parameters A     Consensus GTGAGTCTGA TGGGTCTGCC CATGGTTTNC TTCCTCTAGT Length    meros GTGAGTCTGA TOGGTOTGCE CATOGTTISO TTCCTCTAGT         Min 18  gt         j  PERH2BD O TTCTGGGGTT ACCTTCCTAT CAGAAGGAAA GGGGAAGAGA Melt  temp    C  E  PERH3BC O TTCTGGGCTT ACCTTCCTAT CAGAAGGAAA TGGGAAGAGA Max 58 es    Consensus TTCTGGGNTT ACCTTCCTAT CAGAAGGAAA NGGGAAGAGA Min  48        eso TTT TT ACTTCCTAT CACHAN Stes    Max                      PERH2BD O TTCTAGGGAG TC
438. ocal  If you do not  have root privileges you can choose to install in your home directory     e Choose where you would like to create symbolic links to the program  DO NOT create symbolic links in the same location as the application   Symbolic links should be installed in a location which is included in your environment PATH   For a system wide installation you can choose for example  usr local bin  If you do not have  root privileges you can create a    bin    directory in your home directory and install symbolic  links there  You can also choose not to create symbolic links     e Wait for the installation process to complete and click Finish     If you choose to create symbolic links in a location which is included in your PATH  the program  can be executed by running the command       clcdnawb6    Otherwise you start the application by navigating to the location where you choose to install it  and running the command         clcdnawb6    CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 15    1 2 5 Installation on Linux with an RPM package    Navigate to the directory containing the rom package and install it using the rpm tool by running  a command similar to       rpm  ivh CLCDNAWorkbench 6 JRE rpm    If you are installing from a CD the rpm packages are located in the  RPMS  directory   Installation of RPM packages usually requires root privileges     When the installation process is finished the program can be executed by running the command       clcdnawb6    1 3 System r
439. od generating  a sequence of the same expected mononucleotide frequency     CHAPTER 13  GENERAL SEQUENCE ANALYSES 201    e Dinucleotide sampling from first order Markov chain  Resampling method generating a  sequence of the same expected dinucleotide frequency     For proteins  the following parameters can be set     e Single amino acid shuffling  Shuffle method generating a sequence of the exact same  amino acid frequency     e Single amino acid sampling from zero order Markov chain  Resampling method generating  a sequence of the same expected single amino acid frequency     e Dipeptide shuffling  Shuffle method generating a sequence of the exact same dipeptide  frequency     e Dipeptide sampling from first order Markov chain  Resampling method generating a  sequence of the same expected dipeptide frequency     For further details of these algorithms  see  Clote et al   2005   In addition to the shuffle method   you can specify the number of randomized sequences to output     Click Next if you wish to adjust how to handle the results  See section 9 2   If not  click Finish     This will open a new view in the View Area displaying the shuffled sequence  The new sequence  is not saved automatically  To save the sequence  drag it into the Navigation Area or press ctrl    S      S on Mac  to activate a save dialog     13 2 Dot plots    Dot plots provide a powerful visual comparison of two sequences  Dot plots can also be used to  compare regions of similarity within a sequence 
440. oes not meet the requirements set in the Primer parameters  See figure 2 38        Primer parameters  Length  Min  18  amp   Melt  temp    C   Max  a  Min  48 3  Inner Melk  temp  20     t Advanced parameters  Mode       Standard PCR   O TaqMan   O Nested PCR        Sequencing    Calculate  w    E       Figure 2 38  The Primer parameters     The default maximum melting temperature is 58  This is the reason why the primer in figure  2 31 with a melting temperature of 58 55 does not meet the requirements and is colored red   If you raise the maximum melting temperature to 59  the primer will meet the requirements and  the dot becomes green     In figure 2 37 there is an asterisk     before the melting temperature  This indicates that this  primer does not meet the requirements regarding melting temperature  In this way  you can easily  see why a specific primer  represented by a dot  fails to meet the requirements     By adjusting the Primer parameters you can define primersto meet your specific needs  Since the  dots are dynamically updated  you can immediately see how a change in the primer parameters  affects the number of red and green dots     CHAPTER 2  TUTORIALS 60    2   3 Calculating a primer pair    Until now  we have been looking at the forward primer  To mark a region for the reverse primer   make a selection from position 1200 to 1400 and     Right click the selection   Reverse primer region here         The two regions should now be located as shown in figure 2 39
441. of the sites   right click the site on the sequence and choose De select This     Site     When the right target vector is selected  you are ready to Perform Cloning  ij   see below     Perform cloning    Once selections have been made for both fragments and vector  click Perform Cloning    4   This  will display a dialog to adapt overhangs and change orientation as shown in figure 18 6     CHAPTER 18  CLONING AND CUTTING    O   Cloning exper          Sequence 1 of 2  pcDNA4_TO  5 078bp circular ve    v    b  Ampicilli    pUC    all   la indill   Y  l   Sall l    1 MZ  PstI    ori pcDNA4 TO YNES BGH reverse primer  5 078bp  gt  BGH pA  Rs      g l  o    SV40 pA    V  Show as Circular Vector  poDNA4 TO Change to ent     Hindi  E    1 ori  SV40 promoter and ori  Smal  EM 7    Zeocin       O pcDNA4_TO  5 078bp circular vector     z D Eg    O  Target vector  from XhoI cut at 1052   1053 to HindIII cut at 978 979  5 004bp re  Target vector  from HindIII cut at 9787979 to XhoI cut at 1052   1053  74bp    Op y                         TCGAGIC    CAG         V Target vector defined  X Define fragments to t    312           Sequence details    4  Show     gt  Sequence layout   gt  Annotation layout     Annotation types    v Restriction sites    V  Show   Labels  Stacked w  Sorting  Aa  E  GI  b  V  Non cutters    v  V  Single cutters  BB vi a a        F  Double cutters  HN samt  2      E F Eco O  geo     B T Hinan  2      Maa O    w  V  Multiple cutters  E Wisma  3      Drs       vison
442. of these two functionalities are described below     17 2 1 Sort sequences by name    With this functionality you will be able to group sequencing reads based on their file name  A  typical example would be that you have a list of files named like this     AUZ Asp F  016 2007 01 L0  A  Z Asp  R UL6 2007 01 L0  AOZ Gln F Olo 7007 01     AO Z Gli ROLO 2007 0 1    11   A03 Asp F 051 2007 01 10    CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 280    A03 Asp R 051 2007 01 10  AUS   Gln PF 031  2007 01 11  AOS Gln B OS   2007 01 11    In this example  the names have five distinct parts  we take the first name as an example      e A02 which is the position on the 96 well plate   e Asp which is the name of the gene being sequenced   e F which describes the orientation of the read  forward reverse   e 016 which is an ID identifying the sample   e 2007 01 10 which is the date of the sequencing run    To start mapping these data  you probably want to have them divided into groups instead of  having all reads in one folder  If  for example  you wish to map each sample separately  or if you  wish to map each gene separately  you cannot simply run the mapping on all the sequences in  one step     That is where Sort Sequences by Name comes into play  It will allow you to specify which part  of the name should be used to divide the sequences into groups  We will use the example  described above to show how it works     Toolbox   High throughput Sequencing       Multiplexing  H3    Sort S
443. omitted whereas the found BLAST hits will automatically be placed right below the query  sequence     e Compactness  You can control the level of sequence detail to be displayed         Not compact  Full detail and spaces between the sequences         Low  The normal settings where the residues are visible  when zoomed in  but with no  extra spaces between         Medium  The sequences are represented as lines and the residues are not visible   There is some space between the sequences         Compact  Even less space between the sequences     e BLAST hit coloring  You can choose whether to color hit sequences and you can adjust the  coloring     e Coverage  In the Alignment info in the Side Panel  you can visualize the number of hit  sequences at a given position on the query sequence  The level of coverage is relative to  the overall number of hits included in the result         Foreground color  Colors the letters using a gradient  where the left side color is used  for low coverage and the right side is used for maximum coverage         Background color  Colors the background of the letters using a gradient  where the left  side color is used for low coverage and the right side is used for maximum coverage        Graph  The coverage is displayed as a graph beneath the query sequence  Learn how  to export the data behind the graph in section   4      x Height  Specifies the height of the graph   x Type  The graph can be displayed as Line plot  Bar plot or as a Color bar   
444. on  S55      N  Rseg T ae     Dobs   logs i        gt  pn logo pn   n 1    Pn is the observed frequency of a amino acid residue or nucleotide of symbol n at a particular  position and N is the number of distinct symbols for the sequence alphabet  either 20 for  proteins or four for DNA RNA  This means that the maximal sequence information content per  position is log  4   2 bits for DNA RNA and logs 20   4 32 bits for proteins     The original implementation by Schneider does not handle sequence gaps     We have slightly modified the algorithm so an estimated logo is presented in areas with Sequence  gaps     lf amino acid residues or nucleotides of one sequence are found in an area containing gaps  we  have chosen to show the particular residue as the fraction of the sequences  Example  if one  position in the alignment contain 9 gaps and only one alanine  A  the A represented in the logo  has a hight of 0 1     Other useful resources  The website of Tom Schneider  http   www Immb nci ferf gov  Loms     WebLogo    CHAPTER 19  SEQUENCE ALIGNMENT 357    http   weblogo berkeley edu    Crooks et al   2004     19 3 Edit alignments  19 3 1 Move residues and gaps    The placement of gaps in the alignment can be changed by modifying the parameters when  creating the alignment  see section 19 1   However  gaps and residues can also be moved after  the alignment is created     select one or more gaps or residues in the alignment   drag the selection to move    This can be done both fo
445. on  site regular expression  Use motif list O  Motif   8333    Accuracy      80 w    Search for the reverse motif    Include negative strand    Exclude unknown regions    Exclude matches in N regions for simple motifs   g    IA Ceen  ame  gema  Xena     Figure 13 26  Setting parameters for the motif search                          The options for the motif search are     e Motif types  Choose what kind of motif to be used         Simple motif  Choosing this option means that you enter a simple motif  e g   ATGATGNNATG         Java regular expression  See section 13 7 3         Prosite regular expression  For proteins  you can enter different protein patterns from  the PROSITE database  protein patterns using regular expressions and describing  specific amino acid sequences   The PROSITE database contains a great number of  patterns and have been used to identify related proteins  see http    www expasy   Org eg1 bin  prosi be liet  pl      CHAPTER 13  GENERAL SEQUENCE ANALYSES 225    nr     zl        Use motif list  Clicking the small button  acy  will allow you to select a saved motif list   see section 13 7 4      e Motif  If you choose to search with a simple motif  you should enter a literal string as your  motif  Ambiguous amino acids and nucleotides are allowed  Example  ATGATGNNATG  If  your motif type is Java regular expression  you should enter a regular expression according  to the syntax rules described in section 13 7 3  Press Shift   F1 key for options  For  prot
446. on machinery  Selenocysteines are very rare amino acids     The table below shows the Standard Genetic Code which is the default translation table     CHAPTER 15  PROTEIN ANALYSES    TTT F Phe TCT S Ser TAT Y Tyr TGT C Cys  TTC F Phe TCC S Ser TAC Y Tyr TGC C Cys  TTA L Leu TCA S Ser TAA   Ter TGA   Ter  TTG L Leu i TCG S Ser TAG   Ter TGG W Trp  CTT L Leu CCT P Pro CAT H His CGT R Arg  CTC L Leu CCC P Pro CAC H His CGC R Arg  CTA L Leu CCA P Pro CAA Q Gin CGA R Arg  CTG L Leu i CCG P Pro CAG Q Gin CGG R Arg  ATT I lle ACT T Thr AAT N Asn AGT S Ser  ATC I lle ACC T Thr AAC N Asn AGC S Ser  ATA I Ile ACA T Thr AAA K Lys AGA R Arg  ATG M Met i ACG T Thr AAG K Lys AGG R Arg  GTT V Val GCT A Ala GAT D Asp GGT G Gly  GTC V Val GCC A Ala GAC D Asp GGC G Gly  GTA V Val GCA A Ala GAA E Glu GGA G Gly  GTG V Val GCG A Ala GAG E Glu GGG G Gly    246    Challenge of reverse translation    A particular protein follows from the translation of a DNA sequence whereas the reverse translation  need not have a specific solution according to the Genetic Code  The Genetic Code is degenerate  which means that a particular amino acid can be translated into more than one codon  Hence  there are ambiguities of the reverse translation     Solving the ambiguities of reverse translation  In order to solve these ambiguities of reverse translation you can define how to prioritize the    codon selection  e g     e Choose a codon randomly   e Select the most frequent codon in a given organism     e Randomiz
447. on sequences  alignments  or trees  See section 3 2 5 for more on this topic     e Audit Support  If this option is checked  all manual editing of sequences will be marked  with an annotation on the sequence  see figure 5 2   Placing the mouse on the annotation  will reveal additional details about the change made to the sequence  see figure 5 3   Note  that no matter whether Audit Support is checked or not  all changes are also recorded in  the History  Lil   see section 8      e Number of hits  The number of hits shown in CLC DNA Workbench  when e g  searching  NCBI   The sequences shown in the program are not downloaded  until they are opened or  dragged saved into the Navigation Area     e Locale Setting  Specify which country you are located in  This determines how punctation  is used in numbers all over the program     e Show Dialogs  A lot of information dialogs have a checkbox   Never show this dialog  again   When you see a dialog and check this box in the dialog  the dialog will not be shown  again  If you regret and wish to have the dialog displayed again  click the button in the  General Preferences  Show Dialogs  Then all the dialogs will be shown again     Deleted selection Editing of sequence selection  220 0 260           GAGATGCCATGCGGAGGACAGTCGGAGATCCGCTCGCGCGCGGA    Figure 5 2  Annotations added when the sequence is edited           260       GATCCGCTCGCGCGCGGAAGGTTAT       Figure 5 3  Details of the editing     CHAPTER 5  USER PREFERENCES AND SETTINGS 106 
448. on site  0       One restriction site  1      Two restriction sites  2   Three restriction sites  3   N restriction sites    Minimum   1    Viaximum      Any number of restriction sites  gt  0             it   na CE TEME teh   _ TE          Figure 18 43  Selecting number of cut sites     If you wish the output of the restriction map analysis only to include restriction enzymes which  cut the sequence a specific number of times  use the checkboxes in this dialog     e No restriction site  0    e One restriction site  1   e Two restriction sites  2   e Three restriction site  3     N restriction sites      Minimum      Maximum    e Any number of restriction sites  gt  O    The default setting is to include the enzymes which cut the sequence one or two times     You can use the checkboxes to perform very specific searches for restriction sites  e g  if you  wish to find enzymes which do not cut the sequence  or enzymes cutting exactly twice     Output of restriction map analysis   Clicking next shows the dialog in figure 18 44    This dialog lets you specify how the result of the restriction map analysis should be presented   e Add restriction sites as annotations to sequence s   This option makes it possible to see    the restriction sites on the sequence  see figure 18 45  and save the annotations for later  use     CHAPTER 18  CLONING AND CUTTING 338       q Restriction Site Analysis e    v  1  Select DNA RNA   Result handling  sequence s   2  Enzymes to be considered  in cal
449. onding sequence on the chromosome     If you place the mouse cursor on the sequence hits in the graphical view  you can see the reading  frame which is  1   2 and  3 for the three hits  respectively     Verify the result    Open NC 000011 in a view  and go to the Hit start position  5 204  29  and zoom to see  the blue gene annotation  You can now see the exon structure of the Human beta globin gene  Showing the three exons on the reverse strand  see figure 2 47      If you wish to verify the result  make a selection covering the gene region and open it in a new  view     right click   Open Selection in New View  L       Save   5      Save the sequence  and perform a new BLAST search     e Use the new sequence as query     CHAPTER 2  TUTORIALS 66     i AAA15F34 BLAST                  HBB he Tr FE               ali       Blast layout    AAA 163934     Gather sequences at top    ie i    NIBL ORD IDJO  nIBL ORD ID 0  7  Sequence color  nIBL ORD IDJ0         Identity  inl BL_ORD_IDja          nIBL ORD IDO   nIBL ORD IDJO       Blast hit coloring    UI    40 100       Sequence layout    NIIRI ORD IDIN         s sai  E u    gt   V  Numbers on sequences    E            NE   Rows  17 Summary of hits From query  44416334 hm E 5    Descrip Ion A  E value Hit start Query start Query end    Identity Positive          RE Ee r  me    E value  2  356 54 5204380 5204607 31 106 99          Score  1 06E 16 5203407 5203539 5204827 E 147 38 F Bit score  3  06E 50 5211794 5212021 31 106 95  9456 
450. ontributing sequence reads    CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 298    will automatically be placed right below the reference  This setting is not relevant when  the compactness is packed         Show sequence ends  Regions that have been trimmed are shown with faded traces  and residues  This illustrates that these regions have been ignored during the  assembly         Show mismatches  When the compactness is packed  you can highlight mismatches  which will get a color according to the Rasmol color scheme  A mismatch is whenever  the base is different from the reference sequence at this position  This setting also  causes the reads that have mismatches to be floated at the top of the view        Packed read height  When the compactness is packed  you can choose the height of  the visible reads  When there are more reads than the height specified  an overflow  graph will be displayed below the reads        Find Conflict  Clicking this button selects the next position where there is an conflict  between the sequence reads  Residues that are different from the reference are  colored  as default   providing an overview of the conflicts  Since the next conflict is  automatically selected it is easy to make changes  You can also use the Space key to  find the next conflict     e Alignment info  There is one additional parameter         Coverage  Shows how many sequence reads that are contributing information to a  given position in the contig  The level of coverage
451. oolbox     2 12 1 The Side Panel way of finding restriction sites    When you open a sequence  there is a Restriction sites setting in the Side Panel  By default  10  of the most popular restriction enzymes are shown  see figure 2 56        Restriction sites       Show  Labels  Stacked    Sorting  Ra LI  Po  54  Non cutters         Single cutters   4  BamHI O  F  F EoR O  E Eo ao O  B 7  Hinan  9  O  BE rao O  BE vob O  Figo  amp         Double cutters    Men    o  DO  sma  2                Multiple cutters  e   P 7  sat  3  O  Figure 2 56  Showing restriction sites of ten restriction enzymes     ST TAGAGGGCCCGTTTAAACC    The restriction sites are shown on the sequence with an indication of cut site and recognition  sequence  In the list of enzymes in the Side Panel  the number of cut sites is shown in  parentheses for each enzyme  e g  Sall cuts three times   If you wish to see the recognition  sequence of the enzyme  place your mouse cursor on the enzyme in the list for a short moment   and a tool tip will appear     You can add or remove enzymes from the list by clicking the Manage enzymes button     CHAPTER 2  TUTORIALS  3    2 12 2 The Toolbox way of finding restriction sites    Suppose you are working with sequence  ATP8a1 MRNA    from the example data  and you wish to  know which restriction enzymes will cut this sequence exactly once and create a 3    overhang  Do  the following     select the ATP8a1 mRNA sequence   Toolbox in the Menu Bar   Cloning and  Restriction S
452. op right side of the Workbench  e g  Pan  Zooming tools  etc      Once the sequence is selected  right click and choose to Insert Restriction Site Before Selection  as shown in figure 2 25        TGCCGACC 4 Tceceecare    Move Starting Point to Selection Start    ATP8a1 fwd    Copy   Open Selection in New View   Edit Selection      Delete Selection   Add Annotation      Trim Sequence Left   Trim Sequence Right   Set Alignment Fixpoint Here   Set Numbers Relative to This Selection  Insert Restriction Site After Selection     Insert Restriction Site Before Selection     HE  Show Enzymes Cutting Inside Outside Selection       BLAST Selection against NCBI        4 BLAST Selection against Local Data       Add Structure Prediction Constraints  gt     Figure 2 25  Adding restriction sites to a primer     In the Filter box enter Hindlll and click on it  At the bottom of the dialog  add a few extra bases 5     of the cut site  this is done to increase the efficiency of the enzyme  as shown figure 18 14     CHAPTER 2  TUTORIALS 54       E        E  EI Insert restriction site before selection E  4   E     ease choose enzymes    Enzyme list       1  Please choose enzymes    Use existing enzyme list    All enzymes  Filter  HindIII    Name Methylation Popularity           HindIII 5   N6 methyladenosine         Sequence to be inserted  5  additional HindIII  cgath AAGCTT              da    Xen      Figure 2 20  Adding restriction sites to a primer              Click OK and the sequence will 
453. option should be used if a file is imported as a bioinformatics  file when it should just have been external file  It could be an ordinary text file which is  imported as a sequence     Import using drag and drop    It is also possible to drag a file from e g  the desktop into the Navigation Area of CLC DNA  Workbench  This is equivalent to importing the file using the Automatic import option described  above  If the file type is not recognized  it will be imported as an external file     Import using copy paste of text    If you have e g  a text file or a browser displaying a sequence in one of the formats that can  be imported by CLC DNA Workbench  there is a very easy way to get this sequence into the  Navigation Area     Copy the text from the text file or browser   Select a folder in the Navigation Area    Paste   71     This will create a new sequence based on the text copied  This operation is equivalent to saving  the text in a text file and importing it into the CLC DNA Workbench     If the sequence is not formatted  i e  if you just have a text like this   ATGACGAATAGGAGTTC   TAGCTA  you can also paste this into the Navigation Area     Note  Make sure you copy all the relevant text   otherwise CLC DNA Workbench might not be able  to interpret the text     7 1 2 Import Vector NTI data  There are several ways of importing your Vector NTI data into the CLC Workbench  The best way  to go depends on how your data is currently stored in Vector NTI    e Your data is stored
454. opy of the selected sequence in a normal sequence view     e Open this sequence  W   This will open the selected sequence in a normal sequence view     e Make sequence circular       This will convert a sequence from a linear to a circular form  If the sequence have matching  overhangs at the ends  they will be merged together  If the sequence have incompatible  overhangs  a dialog is displayed  and the sequence cannot be made circular  The circular  form is represented by  gt  gt  and  lt  lt  at the ends of the sequence     e Make sequence linear       This will convert a sequence from a circular to a linear form  removing the  lt  lt  and  gt  gt  at  the ends     Manipulate parts of the sequence    Right clicking a selection reveals several options on manipulating the selection  see figure 18 10      380       7 A RAFAT Aar ESS T    Duplicate Selection  TIGAC Replace Selection With Sequence     mo   E Insert Sequence Before Selection     Eh Insert Sequence After Selection     ME Cut Sequence Before Selection  OS   Cut Sequence After Selection  ITT Make Positive Strand Single Stranded  LI Make Negative Strand Single Stranded  mor Make Double Stranded     i Copy    Open Selection in Mew view    Edit Selection     Er Delete Selection  Gi  Add Annotation     Insert Restriction Site After Selection     Insert Restriction Site Before Selection     Fei Show Enzymes Cutting Inside Outside Selection       Add Structure Prediction Constraints b    Figure 18 10  Right click on a seq
455. or all other data  you can only search for name     lf you use Any field  it will search all of the above plus the following     e Description    Keywords    Common name    Taxonomy name    CHAPTER 4  SEARCHING YOUR DATA 103    To see this information for a sequence  switch to the Element Info    5  view  see section 10 4      For each search line  you can choose if you want the exact term by selecting  is equal to  or if  you only enter the start of the term you wish to find  select  begins with       An example is shown in figure 4 7        dl Search            Search in Location   CLC Data we  within   Nucleotide Sequence      Organism homo sapiens E    4    Add Filter     Search    Description Length Path  BCOS0969 Homo sapiens breast cancer 1  early onset  mRNA fc     2050 WLS Datal    omo sapiens breast cancer 1  early onset  mRNA  c     3273 ELC Datat  omo sapiens breast cancer 1  early onset  mRNA fc    1468 JLC Data  omo sapiens breast cancer 1  early onset  mANA fc    7790 CLE Daal  omo sapiens breast cancer 1  early onset  mRNA  c   5654 MILE Datal  E    Co62429 omo sapiens breast cancer 1  early onset  mRNA fc    1322 CLC Data     Showing 1            ap    Figure 4 7  Searching for human sequences shorter than 10 000 nucleotides     This example will find human nucleotide sequences  organism is Homo sapiens   and it will only  find sequences shorter than 10 000 nucleotides     Note that a search can be saved     for later use  You do not save the search results  
456. or remove sequences or  sequence lists from the selected elements     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     Note  You can select multiple DNA sequences and sequence lists at a time  If the sequence list  contains RNA sequences as well  they will not be converted     229    CHAPTER 14  NUCLEOTIDE ANALYSES 230                            a  EB Convert DNA to RNA ba     1  Select DNA sequences     Select DNA seo UENCES  Projects  Selected Elements   1   J  CLC Data xx ATP8al mRNA     gt  Example Data  Xx ATP8al genomic s  xx     Cloning       Primers     Protein analyses     Protein orthologs     RNA secondary st     Sequencing data  gt       Q nter search term gt    4  poros    tee  Kena             Figure 14 1  Translating DNA to RNA     14 2 Convert RNA to DNA    CLC DNA Workbench lets you convert an RNA sequence into DNA  substituting the U residues   Urasil  for T residues  Thymine      select an RNA sequence in the Navigation Area   Toolbox in the Menu Bar      Nucleotide Analyses    A    Convert RNA to DNA  3      or right click a sequence in Navigation Area   Toolbox   Nucleotide Analyses  4     Convert RNA to DNA  3     This opens the dialog displayed in figure 14 2        Fo  EB Convert RNA to DNA  58   1  Select RNA sequences    SelecP RNA sec JUENCeS  Projects  Selected Elements   1       CLC_Data xx ATP8a1 mRNA  3 UTR large         gt  Example Data  Xc ATP8al genomic s          XxX ATP8al mRNA   4 Cloni
457. ormation about how the vector will be cut open   Since the vector has now been split into two fragments  you can decide which one to use as the  target vector  If you selected first the Hindlll site and next the Xho  site  the CLC DNA Workbench  has already selected the right fragment as the target vector  If you click one of the vector  fragments  the corresponding part of the sequence will be high lighted     Next step is to cut the fragment  At the top of the view you can switch between the sequences  used for cloning  at this point it says pcDNA4_TO 5 078bp circular vector   Switch to  the fragment sequence and perform the same selection of cut sites as before while pressing the  Ctrl    on Mac  key  You should now see a view identical to the one shown in figure 2 31     When this is done  the Perform Cloning    5  button at the lower right corner of the view is active  because there is now a valid selection of both fragment and target vector  Click the Perform  Cloning         button and you will see the dialog shown in figure 2 32     This dialog lets you inspect the overhangs of the cut site  showing the vector sequence on each    CHAPTER 2  TUTORIALS    O   Cloning exper            Sequence 1 of 2  pcDNA4 TO  5 078bp circular ve    v  7  Show as Circular Vector  pcDNA4 TO     Baill  MV    CMV forward primer  TATA box   Sall    Pstl    bla  Ampicilli    pcDNA4_TO YNES  5 078bp    pUC ori BGH reverse primer    BGH pA  coRV   1 ori   SV40 promoter and ori   EM 7    Zeoci
458. osen parameters  Maximum primer length  Minimum primer length  Maximum  amp  C content  Minimum GIC content  Maximum melting temperature  Minimum melting temperature  Maximum self annealing  Maximum self end annealing  Maximum secondary structure  3 end must meet G C requirements  Send must meet G C requirements    Mispriming parameters  Use mispriming as exclusion criteria      Exact match  Minimum number of base pairs required for a match  12        A  wil    Number of consecutive base pairs required in 3    end      x Cancel       Help      Figure 16 11  Calculation dialog for sequencing primers       Since design of sequencing primers does not require the consideration of interactions between  primer pairs  this dialog is identical to the dialog shown in Standard PCR mode when only a single  primer region is chosen  See the section 16 5 for a description     CHAPTER 16  PRIMERS 265    16 8 1 Sequencing primers output table    In this mode primers are predicted independently for each region  but the optimal solutions are  all presented in one table  The solutions are numbered consecutively according to their position  on the sequence such that the forward primer region closest to the 5    end of the molecule is  designated F1  the next one F2 etc     For each solution  the single primer information described under Standard PCR is available in the  table     16 9 Alignment based primer and probe design    CLC DNA Workbench allows the user to design PCR primers and TaqMan pr
459. osome 18 genomic contig  alternate assembly 339 602 85  1e 90 100    NT 011109 15 Homo sapiens chromosome 19 genomic contig  reference assembly 262 375 73  3e 67 94    NW 927217 1 Homo sa piens chromosome 19 genomic contig  alternate assembly 62 375 73  3e 67 94        Figure 12 20  BLAST table view  A table view with one row per hit  showing the accession number  and description field from the sequence file together with BLAST output scores     and the start and stop positions for the query and hit sequence are listed  The strand and  orientation for query sequence and hits are also found here     In most cases  the table view of the results will be easier to interpret than tens of sequence  alignments     CHAPTER 12  BLAST SEARCH 196     gt   ref NM 173209 1  UE GM Homo sapiens TGFB induced factor  TALE family homeobox   TGIF    transcript variant 5  mRNA  Length 1382  Sort alignments for this subject sequence by   Evalue Score Percent identity    LEFY start position subject STaArt position       Score   339 bits  171   Expect   12 50  Identities   171 171  100    Gaps   0 171  03   Stcrand Plus Plus    Query  Sbjct    Query    1113    ATTIGCACATGGGATIGCTAAAACAGCTICCIGITACIGAGATGICITCAATGGAATACA  DESEESEEENSSEENSSEENENEERENEENENEENEREERERENEEEEREERENEREEE  ATITGCACATGGGAITIGCTAAAACAGCITCCIGITACIGAGAIGICITCAATIGGAATACA    GICATTCCAAGAACTATAAACTTAAAGCTACTGTAGAAACAAAGGSTTITCITITITAAA  PECTED  GICATICCAAGAACTATABACTTASAGCTACTGIAGAAACAAAGEGTITICITITITASA    IGITICITGGIAGATIATICA
460. osts and Lambda ratio will result in alignments which  decrease the number of Gaps introduced     e Max number of hit sequences  The maximum number of database sequences  where  BLAST found matches to your query sequence  to be included in the BLAST report     The parameters you choose will affect how long BLAST takes to run  A search of a small database   requesting only hits that meet stringent criteria will generally be quite quick  Searching large  databases  or allowing for very remote matches  will of course take longer     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     CHAPTER 12  BLAST SEARCH 178    12 1 2 BLAST a partial sequence against NCBI  You can search a database using only a part of a sequence directly from the sequence view     select the sequence region to send to BLAST   right click the selection   BLAST  Selection Against NCBI  i     This will go directly to the dialog shown in figure 12 3 and the rest of the options are the same  as when performing a BLAST search with a full Sequence     12 1 3 BLAST against local data    Running BLAST searches on your local machine can have several advantages over running the  searches remotely at the NCBI     e It can be faster    e It does not rely on having a stable internet connection    e It does not depend on the availability of the NCBI BLAST blast servers   e You can use longer query sequences     e You use your own data sets to search against     On a techn
461. ot  click Finish   A new sequence list will be generated for each group  It will be named according to the group   e g  Asp016 will be the name of one of the groups in the example shown in figure 17 4     Advanced splitting using regular expressions    You can see a more detail explanation of the regular expressions syntax in section 13 7 3  In this  section you will see a practical example showing how to create a regular expression  Consider a    CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 282    list of files as shown below     adk 29 adkln F  adk 29  adk2n R  adk 3 adkln F  adk 3 adkzn R  adk 66 adkln F  adk 66_adk2n R  atp 29 atpAln F  aco 20 atpAZn  KR  atp 3 atpAln F  atp 3 atpAZn R  arcp  CO  atpALN E  atp 66  etpAZn R    In this example  we wish to group the sequences into three groups based on the number after the      and before the  _   i e  29  3 and 66   The simple splitting as shown in figure 17 4 requires  the same character before and after the text used for grouping  and since we now have both a      and a  _   we need to use the regular expressions instead  note that dividing by position would  not work because we have both single and double digit numbers  3  29 and 66       The regular expression for doing this would be           _     as shown in figure 17 5   The round brackets    denote the part of the name that will be listed in the groups table at the      Sort Sequences by Name    1  Select at least 2 et algorithm parameters  sequences of the sa
462. otate with GFF file  CLC bio   support clcbio com  Version 1 03       Using this plug in it is possible to annotate a sequence from list of annotations found in a GFF file  Located in the Toolbox        Extract Annotations    O CLC bio   support clcbio com  version 1 02    Extracts annotations from one or more sequences  The result is a sequence list containing sequences covered by the specified annotations     Uninstall       v     Figure 1 25  The plug in manager with plug ins installed                 Click the plug in   Uninstall    If you do not wish to completely uninstall the plug in but you don   t want it to be used next time  you start the Workbench  click the Disable button     When you close the dialog  you will be asked whether you wish to restart the workbench  The  plug in will not be uninstalled before the workbench is restarted     1 7 3 Updating plug ins    If a new version of a plug in is available  you will get a notification during start up as shown in  figure 1 26     In this list  select which plug ins you wish to update  and click Install Updates  If you press  Cancel you will be able to install the plug ins later by clicking Check for Updates in the Plug in  manager  see figure 1 25      1 7 4 Resources    Resources are downloaded  installed  un installed and updated the same way as plug ins  Click  the Download Resources tab at the top of the plug in manager  and you will see a list of available  resources  see figure 1 27      Currently  the only re
463. otations on the regions that will be ignored in the assembly  process     17 3 1 Manual trimming    Sequence reads can be trimmed manually while inspecting their trace and quality data  Trimming  sequences manually corresponds to adding annotation  see also section 10 3 2  but is special  in the sense that trimming can only be applied to the ends of a sequence     double click the sequence to trim in the Navigation Area   select the region you  want to trim   right click the selection   Trim sequence left right to determine the  direction of the trimming    This will add trimming annotation to the end of the sequence in the selected direction     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 289    17 3 2 Automatic trimming    Sequence reads can be trimmed automatically based on a number of different criteria  Automatic  trimming is particularly useful in the following situations     e  f you have many sequence reads to be trimmed   e  f you wish to trim vector contamination from sequence reads     e  f you wish to ensure that the trimming is done according to the same criteria for all the  sequence reads     To trim Sequences automatically     select sequence s  or sequence lists to trim   Toolbox in the Menu Bar   Sequencing  Data Analyses  157    Trim Sequences        This opens a dialog where you can alter your choice of Sequences   When the sequences are selected  click Next     This opens the dialog displayed in figure 17 15        g Trim Sequences xX    la  1  Sele
464. other       complementiis     complement 15     complementiso     complementiae            Show column  Primer name    Orientation   Region   Mismatches   Number of other matches     Melt  temp        Self annealing       Self annealing alignment     Self end annealing      GC content       Secondary structure score       Secondary structure    Select All  Deselect All    Figure 16 20  A table showing all binding sites     An example of the primer binding site table is shown in figure 16 20     The information here is the same as in the primer annotation and furthermore you can see  additional information about melting temperature etc  by selecting the options in the Side Panel   See a more detailed description of this information in section 16 5 2  You can use this table    to browse the binding sites     If you make a split view of the table and the sequence  see    section 3 2 6   you can browse through the binding positions by clicking in the table  This will  cause the sequence view to jump to the position of the binding site     An example of a fragment table is shown in figure 16 21        ES pcOWAT atpaal    O    Rows  7    prirmer 5  primer   primer   primer     Fragments    Rev  primer 2  primer 1  primer 5  primer S    primer 6  EcoR primer 1  primer 6  EcoR primer 5  primer 6  EcoRV primer 5      HindIII      HindIII    Fragment length    1458   eal  1465  1451   269  1453  1469         Column width    1575  3062  151   401   Show column  151  1615  7  Fwd  151  1
465. ou choose an appropriate codon frequency  table      e Map annotations to reverse translated sequence  If this checkbox is checked  then all  annotations on the protein sequence will be mapped to the resulting DNA sequence  In the  tooltip on the transferred annotations  there is a note saying that the annotation derives  from the original sequence     The Codon Frequency Table is used to determine the frequencies of the codons  Select a  frequency table from the list that fits the organism you are working with  A translation table of  an organism is created on the basis of counting all the codons in the coding sequences  Every  codon in a Codon Frequency Table has its own count  frequency  per thousand  and fraction  which are calculated in accordance with the occurrences of the codon in the organism  You can  customize the list of codon frequency tables for your installation  see section J     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   The newly created nucleotide sequence is shown  and if the analysis was performed on several  protein sequences  there will be a corresponding number of views of nucleotide sequences  The  new sequence is not saved automatically  To save the sequence  drag it into the Navigation  Area or press Ctrl   S  96   S on Mac  to show the save dialog     15 3 2 Bioinformatics explained  Reverse translation    In all living cells containing hereditary material such as DNA  a transcription to mRN
466. ou have to perform separate  mappings for each sequence list     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 28      An example using Illumina barcoded sequences    The data set in this example can be found at the Short Read Archive at NCBI  http   www   ncbi nlm nih gov sra SRX014012  It can be downloaded directly in fastq format via  the URL http   trace ncbi nlim nih gov Traces sra sra cgi emd dload amp run  list SRR030730 amp format fastq  The file you download can be imported directly into the  Workbench     The barcoding was done using the following tags at the beginning of each read  CCT  AAT  GGT   CGT  see supplementary material of  Cronn et al   2008  at http    nar oxfordjournals   Ong   coi  dala gendOZ DCI      The settings in the dialog should thus be as shown in figure 17 11        m  q Process Tagged Sequences  E      Choose where to run Spams CJ    Tag list    Select nucleotide  sequences 1 Barcode  Barcodes  length  3  Define barcodes in next step   Sequence    Sequence length from  1 to  500 nucleotides     A      Define tags    2    J          Figure 17 11  Setting the barcode length at three       Click Next to specify the bar codes as shown in figure 17 12  use the Add button             g Process Tagged Sequences j        Choose where to run Spams JJ    Barcodes    Select nucleotide  sequences       Barcode Name   of reads in input      Define tags  CCT Barcode CCT         Set barcode options  AAT Barcode  44T       GGT Barcode GGT       CGT Barco
467. ou select  in this dialog  see how to change the definition of sites in appendix F   Note that the CLC DNA  Workbench only checks that valid attP sites are found   it does not check that they correspond to  the attB sites of the selected fragments at this step  If the right combination of attB and attP  sites is not found  no entry clones will be produced     Below there is a preview of the fragments selected and the attB sites that they contain  This can  be used to get an overview of which entry clones should be used and check that the right attB  sites have been added to the fragments     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     The output is one entry clone per sequence selected  The attB and attP sites have been used for  the recombination  and the entry clone is now equipped with attL sites as shown in figure 18 24         atp8a1_CDS  pDONR221 Entry Clone   6004 bp    Figure 18 24  The resulting entry vector opened in a circular view     CHAPTER 18  CLONING AND CUTTING 326    Note that the bi product of the recombination is not part of the output     18 2 3 Create expression clones  LR     The final step in the Gateway cloning work flow is to recombine the entry clone into a destination  vector to create an expression clone  the so called LR reaction     Toolbox in the Menu Bar   Cloning and Restriction Sites  si    H    Create Expression Clone            Gateway Cloning    This will open a dialog where you c
468. ownload several of the hit sequences  this is  easily done in this view  Simply select the relevant sequences and drag them into a folder in the  Navigation Area     2 9 Tutorial  Tips for specialized BLAST searches    Here  you will learn how to     e Use BLAST to find the gene coding for a protein in a genomic sequence   e Find primer binding sites on genomic sequences  e Identify remote protein homologues     Following through these sections of the tutorial requires some experience using the Workbench   so if you get stuck at some point  we recommend going through the more basic tutorials first     2 9 1 Locate a protein sequence on the chromosome    If you have a protein sequence but want to see the actual location on the chromosome this is  easy to do using BLAST     In this example we wish to map the protein sequence of the Human beta globin protein to a  chromosome  We know in advance that the beta globin is located somewhere on chromosome  11     Data used in this example can be downloaded from GenBank   Search   Search for Sequences at NCBI         Human chromosome 11  NC_000011  consists of 134452384 nucleotides and the beta globin   AAA16334  protein has 147 amino acids     CHAPTER 2  TUTORIALS 65    BLAST configuration  Next  conduct a local BLAST search   Toolbox   BLAST Search        Local BLAST    2     Select the protein Sequence as query sequence and click Next  Since you wish to BLAST a  protein sequence against a nucleotide sequence  use tblastn which will 
469. parts of the sequences will be ignored in the subsequent assembly     A natural question is  Why not simply delete the trimmed regions instead of annotating them   In some cases  deleting the regions would do no harm  but in other cases  these regions could  potentially contain valuable information  and this information would be lost if the regions were  deleted instead of annotated  We will see an example of this later in this tutorial     2 5 2 Assembling the sequencing data    The next step is to assemble the sequences  This is the technical term for aligning the sequences  where they overlap and reverse the reverse reads to make a contiguous sequence  also called a  contig      In this tutorial  we will use assembly to a reference sequence  This can be used when you have  a reference sequence that you know is similar to your Sequencing data     Toolbox in the Menu Bar   Sequencing Data Analyses  i    Assemble Sequences   to Reference  FF   In the first dialog  select the nine sequencing reads and click Next to go to the second step of  the assembly where you select the reference sequence     Click the Browse and select button  G  and select the  ATP8a1 mRNA  reference   from the   Sequencing data  folder  see figure 2 14   You can leave the other options in this window set  to their defaults        A  g Assemble Sequences to Reference ead     1  Select some nucleotide   Setreference parameters  sequences   2  Set reference parameters  Reference sequence  Choose reference 
470. pliers   Methylation sensitivity   C  Recognizes palindrome  Star activity    C  Popularity    Figure 18 53  An enzyme list     and you can use the filter at the top right corner to search for specific enzymes  recognition    sequences etc     If you wish to remove or add enzymes  click the Add Remove Enzymes button at the bottom of  the view  This will present the same dialog as shown in figure 18 50 with the enzyme list shown    to the right     If you wish to extract a subset of an enzyme list     CHAPTER 18  CLONING AND CUTTING 346    open the list   select the relevant enzymes   right click   Create New Enzyme List    from Selection  i       If you combined this method with the filter located at the top of the view  you can extract a  very specific set of enzymes  E g  if you wish to create a list of enzymes sold by a particular  distributor  type the name of the distributor into the filter  and select and create a new enzyme  list from the selection     Chapter 19    Sequence alignment    Contents  19 1 Create an alignment       2    2 ee ee 348  alee  CO oa oe eee ewe ea eo ee eG ES eS 349  19 1 2 Fast or accurate alignment algorithm                 26 4  349  19 1 3 Aligning alignments   a  5464 4 eee eee Gee eee eae Res 350  Loe POS  ota eee Oka ea we ee Reed ADS RB E MA 351  19 2 Viewalignments        2 00 ee ee ee ee 353  19 2 1 Bioinformatics explained  Sequence logo              22086  355  19 3 Edit alignments       1    2 ee rensas ssnannna 357  19 3 1 Move res
471. port them as an archive through the File menu     This will produce a file with a ma4   pa4  or oa4 extension  Back in the CLC Workbench  click  Import     and select the file     Importing single files   In Vector NTI  you can save a sequence in a file instead of in the database  see figure   6     This will give you file with a  gb extension  This file can be easily imported into the CLC Workbench   Import        select the file   Select    You don   t have to import one file at a time  You can simply select a bunch of files or an entire  folder  and the CLC Workbench will take care of the rest  Even if the files are in different formats     You can also simply drag and drop the files into the Navigation Area of the CLC Workbench     The Vector NTI import is a plug in which is pre installed in the Workbench  It can be uninstalled  and updated using the plug in manager  see section 1 7      CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 122    Save As   ES     save As File   Save in DNA ANAs Database As   Remote Sources      Save jn E Desktop    E  Al  oR  pace  E       File name  Adenoz  gb  Files format      DNA RNA  Documents    gb          OF    Cancel      Figure 7 6  Saving a sequence as a file in Vector NTI        7 1 3 Export of bioinformatics data    CLC DNA Workbench can export bioinformatic data in most of the formats that can be imported   There are a few exceptions  See section 7 1 1     To export a file     select the element to export   Export  E2    choose whe
472. quence has number 59 in front of the sequence  this  means that 58 residues are found upstream of this position  but these are not included in the  alignment     By right clicking the sequence name in the Graphical BLAST output it is possible to download the  full hits sequence from NCBI with accompanying annotations and information  It is also possible  to just open the actual hit sequence in a new view     12 2 4 BLAST table    In addition to the graphical display of a BLAST result  it is possible to view the BLAST results in  a tabular view  In the tabular view  one can get a quick and fast overview of the results  Here  you can also select multiple sequences and download or open all of these in one single step   Moreover  there is a link from each sequence to the sequence at NCBI  These possibilities are  either available through a right click with the mouse or by using the buttons below the table     lf the BLAST table view was not selected in Step 4 of the BLAST search  the table can be shown  in the following way     Click the Show BLAST Table button  8  at the bottom of the view       CHAPTER 12  BLAST SEARCH 184      ES CAAZ6204 BLAST O    Rows  103 Summary oF hits From query  C4426204 Fite  sid    Description E value Score Bit score    1COH E Chain 6  4lpha Ferrous Carbonmonoxy  Beta Cobaltou    3 36E 66  624 244  97  1  85 B Chain E  T To T High  Quaternary Transitions In Human    3  36E 66 244 973    1483 8 Chain B  T To T High  Quaternary Transitions In Human    
473. quences oF same ype  type Projects  Selected Elements   2    p CLC_Data Ms 09429  EE Example Data ss  P39524    XxX ATP8al genomit  Xx ATP8al mRNA  Ss ATP8al  H E Cloning  H  Primers  H E Protein analyse   EE Protein ortholog  s  o       on  44 P57792  oN 29449  olf QONTI2  ofthe Q9SX33    H  RNA secondary  H  Sequencing dat                  mW            Qy zenter search term gt    4     amp  Previous  gt  Next   Finish    Figure 13 17  Selecting two sequences to be joined                       r    W Join Sequences   ES    1  Select sequences of same   Separates               ccm   T  type    2  Set parameters    Set order of concatenation   top First       ss 094296 a     ss  P39524 7a                OCS Ce Coe  ie  Xe     Figure 13 18  Setting the order in which sequences are joined                 Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     The result is shown in figure 13 19     Gene              Joined Sequence    Figure 13 19  The result of joining sequences is a new sequence containing the annotations of the  joined sequences  they each had a HBB annotation      13 6 Pattern Discovery    With CLC DNA Workbench you can perform pattern discovery on both DNA and protein sequences   Advanced hidden Markov models can help to identify unknown sequence patterns across single  or even multiple sequences     In order to search for unknown patterns     CHAPTER 13  GENERAL SEQUENCE ANALYSES 220    Select DNA or protein 
474. r Click the     J  at the top right corner of the Side Panel to hide   Click the gray  Side Panel button to the right to show    Below  each group of settings will be explained  Some of the preferences are not the same  for nucleotide and protein sequences  but the differences will be explained for each group of  settings     Note  When you make changes to the settings in the Side Panel  they are not automatically  saved when you save the sequence  Click Save restore Settings  i   to save the settings  see  section 5 6 for more information      Sequence Layout    These preferences determine the overall layout of the sequence     e Spacing  Inserts a space at a specified interval         No spacing  The sequence is shown with no spaces         Every 10 residues  There is a space every 10 residues  starting from the beginning of  the sequence         Every 3 residues  frame 1  There is a space every 3 residues  corresponding to the  reading frame starting at the first residue     CHAPTER 10  VIEWING AND EDITING SEQUENCES 143        Every 3 residues  frame 2  There is a space every 3 residues  corresponding to the  reading frame starting at the second residue         Every 3 residues  frame 3  There is a space every 3 residues  corresponding to the  reading frame starting at the third residue     e Wrap sequences  Shows the sequence on more than one line         No wrap  The sequence is displayed on one line         Auto wrap  Wraps the sequence to fit the width of the view  not
475. r all of the nodes the tree contains     CHAPTER 20  PHYLOGENETIC TREES 370    Text size  The size of the text representing the nodes can be modified in tiny  small   medium  large or huge     Font  Sets the font of the text of all nodes  Bold  Sets the text bold if enabled     e Tree Layout  Different layouts for the tree     Node symbol  Changes the symbol of nodes into box  dot  circle or none if you don t  want a node symbol     Layout  Displays the tree layout as standard or topology     Show internal node labels  This allows you to see labels for the internal nodes   Initially  there are no labels  but right clicking a node allows you to type a label     Label color  Changes the color of the labels on the tree nodes   Branch label color  Modifies the color of the labels on the branches   Node color  Sets the color of all nodes    Line color  Alters the color of all lines in the tree     e Labels  Specifies the text to be displayed in the tree     Nodes  Sets the annotation of all nodes either to name or to species     Branches  Changes the annotation of the branches to bootstrap  length or none if you  don t want annotation on branches     Note  Dragging in a tree will change it  You are therefore asked if you want to save this tree when  the Tree View is closed     You may select part of a Tree by clicking on the nodes that you want to select     Right click a selected node opens a menu with the following options     Set root above node  defines the root of the tree to
476. r annotations on the reads  no matter how this is specified in the  respective settings  And when the compactness is Packed  it is not possible to edit  the bases of any of the reads  There is a shortcut way of changing the compactness   Press and hold the Alt key while you scroll using your mouse wheel or touchpad     X    X    e    x         Not compact  The normal setting with full detail     Low  Hides trace data  quality scores and puts the reads    annotations on the  sequence     Medium  The labels of the reads and their annotations are hidden  and the  residues of the reads cannot be seen     Compact  Even less space between the reads     Packed  All the other compactness settings will stack the reads on top of each  other  but the packed setting will use all space available for displaying the reads   When zoomed in to 100   you can see the residues but when zoomed out the  reads will be represented as lines just as with the Compact setting  Please note  that the packed mode is special because it does not allow any editing of the read  sequences and selections  and furthermore the color coding that can be specified  elsewhere in the Side Panel does not take effect  An example of the packed  compactness setting is shown in figure 17 22         Gather sequences at top  Enabling this option affects the view that is shown when  scrolling horizontally  If selected  the sequence reads which did not contribute to the  visible part of the mapping will be omitted whereas the c
477. r of hydrophobicity scales  which are further explained in section 15 2 3 Click Next if you wish to adjust how to handle the  results  see section 9 2   If not  click Finish  The result can be seen in figure 15 4     See section B in the appendix for information about the graph view     15 2 2 Hydrophobicity graphs along sequence    Hydrophobicity graphs along sequence can be displayed easily by activating the calculations from  the Side Panel for a sequence     right click protein sequence in Navigation Area   Show   Sequence   open Protein  info in Side Panel   or double click protein sequence in Navigation Area   Show   Sequence   open Protein  info in Side Panel    CHAPTER 15  PROTEIN ANALYSES 240    Hydrophobicity plot of ATP8a1                  E   o    lt     EL   2   T       I  Engelman  Eisenberg          Kyte Doolittle       0 100 200 300 400 500 600 700 800 900 1  Position    Figure 15 4  The result of the hydrophobicity plot calculation and the associated Side Panel     These actions result in the view displayed in figure 15 5     F RESIS CONJ Wy      Protein info  k Evte Doolikkle    Cornette  k Engelman  k Eisenberg  k Rose  k Janin  k Hopp voods  t Welling  k Kolaskar Tongaonkar  k Surface Probability    Chain Flexibility       Find    Figure 15 5  The different available scales in Protein info in CLC DNA Workbench     The level of hydrophobicity is calculated on the basis of the different scales  The different scales  add different values to each type of amino
478. r sequence names  This is useful for searching sequence lists   mapping results and BLAST results     This concludes the description of the View Preferences  Next  the options for selecting and  editing sequences are described     Text format  These preferences allow you to adjust the format of all the text in the view  both residue letters     sequence name and translations if they are shown      e Text size  Five different sizes   e Font  Shows a list of Fonts available on your computer     e Bold residues  Makes the residues bold     10 1 2 Restriction sites in the Side Panel    Please see section 18 3 1     10 1 3 Selecting parts of the sequence  You can select parts of a sequence     Click Selection Ch   in Toolbar   Press and hold down the mouse button on the  sequence where you want the selection to start   move the mouse to the end of  the selection while holding the button   release the mouse button    Alternatively  you can search for a specific interval using the find function described above   If you have made a selection and wish to adjust it     drag the edge of the selection  you can see the mouse cursor change to a horizontal  arrow    or press and hold the Shift key while using the right and left arrow keys to adjust the  right side of the selection     If you wish to select the entire sequence     double click the sequence name to the left    Selecting several parts at the same time  multiselect     You can select several parts of sequence by holding down th
479. r single sequences  but also for multiple sequences by making a  selection covering more than one sequence  When you have made the selection  the mouse  pointer turns into a horizontal arrow indicating that the selection can be moved  see figure 19 9      Note  Residues can only be moved when they are next to a gap     AGG GAGTCAT AGG GAGTCAT  AGG GAGTCAT AGG GAGTCAT  AGG GAGCAGT AGG GAGCAGT    AGG GTACAGT    ATG GTGCACC ATG GTGCACC  ATG GTGCATC ATG GTGCATC    Figure 19 9  Moving a part of an alignment  Notice the change of mouse pointer to a horizontal  arrow        19 3 2 Insert gaps    The placement of gaps in the alignment can be changed by modifying the parameters when  creating the alignment  However  gaps can also be added manually after the alignment is  created     To insert extra gaps   select a part of the alignment   right click the selection   Add gaps before after    If you have made a selection covering e g  five residues  a gap of five will be inserted  In this way  you can easily control the number of gaps to insert  Gaps will be inserted in the sequences that  you selected  If you make a selection in two sequences in an alignment  gaps will be inserted  into these two sequences  This means that these two sequences will be displaced compared to  the other sequences in the alignment     19 3 3 Delete residues and gaps    Residues or gaps can be deleted for individual sequences or for the whole alignment  For  individual sequences     CHAPTER 19  SEQUENCE ALIGN
480. r this type into the Type field 2     The right hand part of the dialog contains the following text fields     e Name  The name of the annotation which can be shown on the label in the sequence views    Whether the name is actually shown depends on the Annotation Layout preferences  see    section 10 3 1      e Type  Reflects the left hand part of the dialog as described above  You can also choose  directly in this list or type your own annotation type     e Region  If you have already made a selection  this field will show the positions of  the selection  You can modify the region further using the conventions of DDBJ  EMBL  and GenBank  The following are examples of how to use the syntax  based on http     www ncbi nlm nih gov collab FT           467  Points to a single residue in the presented sequence         340  565  Points to a continuous range of residues bounded by and including the  starting and ending residues          lt 345  500  Indicates that the exact lower boundary point of a region is unknown  The  location begins at some residue previous to the first residue specified  which is not    2Note that your own annotation types will be converted to  unsure  when exporting in GenBank format  As long as  you use the sequence in CLC format  you own annotation type will be preserved    CHAPTER 10  VIEWING AND EDITING SEQUENCES 158    necessarily contained in the presented sequence  and continues up to and including  the ending residue          lt 1  888  The region st
481. r use  the view of enzyme lists  see 18 5      Click Finish to open the enzyme list        The CLC DNA Workbench comes with a standard set of enzymes based on http   www  rebase neb com  You  can customize the enzyme database for your installation  see section E    CHAPTER 18  CLONING AND CUTTING    Restriction Site Analysis    1  Select DNA RNA  sequence s     2  Enzymes to be considered  in calculation    Enzyme list    Use existing enzyme list Popular enzymes v      Enzymes in  Popular en        Filter                      Name Overhang    PstI        tgca    Kpnl     gtac  Sacl     aget  SphI     catg  Apal     gace    Ball       nnn    Chal     gatc   FokI     lt NA gt   Hhal   cg    Nsil        tgca    SacII    gc    Methylat    Popul          N6 met      n N   met            i S meth            t se    stot       S meth     it RE       N6 met        S meth          S meth             Enzymes to be used    Filter                 Name    Overhang Methyla       Pop             Figure 18 51  Selecting enzymes                                All enzymes  Filter  3  Name Overh    Methyl    Pop        PstI 3 N6 meth     eer  a  KpnI 3 N   meth    ee    Sacl 3 S methyl    ee    SphI 3  Hk    Apal 3  S methyl    8     Sacll 3 5 methyl     e  Nsil Enzyme  Sacll  Chal Recognition site pattern  CCGCGG  Ball Suppliers  GE Healthcare  Hhal Qbiogene  Xml American Allied Biochemical  Inc   Dralll Nippon Gene Co  Ltd    Takara Bio Inc   Banll       New England Biolabs  Toyobo Biochemica
482. r view of the contig later on by  clicking Table  H8  at the bottom of the view   For more information about the tabular view  of contigs  see section 1             e Create only consensus sequences  This will not display a contig but will only output the  assembled contig sequences as single nucleotide sequences  If you choose this option it  is not possible to validate the assembly process and edit the contig based on the traces     If you have chosen to  Trim sequences   click Next and you will be able to set trim parameters   see section 17 3 2      Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     When the assembly process has ended  a number of views will be shown  each containing a  contig of two or more sequences that have been matched  If the number of contigs seem too  high or low  try again with another Alignment stringency setting  Depending on your choices of  output options above  the views will include trace files or only contig sequences  However  the  calculation of the contig is carried out the same way  no matter how the contig is displayed     See section 1    on how to use the resulting contigs     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 293    17 5 Assemble to reference sequence    This section describes how to assemble a number of Sequence reads into a contig using a  reference sequence  A reference sequence can be particularly helpful when the objective is to  characterize SNP variation in 
483. ract the sequences    Toolbox   General Sequence Analyses  KA    Extract Sequences        This will allow you to select the elements that you want to extract sequences from  see the list  above   Clicking Next displays the dialog shown in 10 17     Here you can choose whether the extracted sequences should be placed in a new list or extracted  as single sequences  For sequence lists  only the last option makes sense  but for alignments   mappings and BLAST results  it would make sense to place the sequences in a list     CHAPTER 10  VIEWING AND EDITING SEQUENCES 166    g Extract Sequences    1  Please select a  sequencelist    2  Select destination    Destination    C Extract to single sequences        Extract to new sequence list    Number of sequences    12 sequences ot paired end pairs found       Figure 10 17  Choosing whether the extracted sequences should be placed in a new list or as  single sequences     Below these options you can see the number of sequences that will be extracted     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     Chapter 11    Online database search    Contents  11 1 GenBank search       tanec ae nike ee ee wee dee HOw oe ee Re ee 167  11 1 1 GenBank search options          0 0 00 ee ee ee ee ee a 167  11 1 2 Handling of GenBank search results       aoao oaoa oa oaoa a a a 00008  169  11 1 3 Save GenBank search parameters        oa a a a a 170  11 2 Sequence web info     nassaan ee eee te a 170
484. rameters  type    2  Choose program    Choose program and database       Program   blastp  Protein sequence and database lw          Database  Swiss Prot protein sequences  swissprot  M      a afan       Previous    gt  Next     X Cancel    Figure 12 3  Choose a BLAST Program and a database for the search        BLAST programs for DNA query sequences     CHAPTER 12  BLAST SEARCH 1 6    e BLASTn  DNA sequence against a DNA database  Used to look for DNA sequences with  homologous regions to your nucleotide query sequence     e BLASTx  Translated DNA sequence against a Protein database  Automatic translation  of your DNA query sequence in six frames  these translated sequences are then used to  search a protein database     e tBLASTx  Translated DNA sequence against a Translated DNA database  Automatic  translation of your DNA query sequence and the DNA database  in six frames  The resulting  peptide query sequences are used to search the resulting peptide database  Note that this  type of search is computationally intensive     BLAST programs for protein query sequences     e BLASTp  Protein sequence against Protein database  Used to look for peptide sequences  with homologous regions to your peptide query sequence     e tBLASTn  Protein sequence against Translated DNA database  Peptide query sequences  are searched against an automatically translated  in six frames  DNA database     Click Next     This window  see figure 12 4  allows you to choose parameters to tune your BLAS
485. rameters Codons          Use random codon  Use only the most frequent codon    Use codon based on Frequency distribution          Bacteria     a Invertebrates     Mammalia     Plants       Primates  Rodents  Jertebrates    Eyi    y Ey                Transfer annotations    Map annotations to reverse translated sequence          Q        Previous      gt  Next    Enish   XX Cancel      Figure 15 9  Choosing parameters for the reverse translation                       e Use random codon  This will randomly back translate an amino acid to a codon without  using the translation tables  Every time you perform the analysis you will get a different  result     e Use only the most frequent codon  On the basis of the selected translation table  this  parameter option will assign the codon that occurs most often  When choosing this option   the results of performing several reverse translations will always be the same  contrary to  the other two options     e Use codon based on frequency distribution  This option is a mix of the other two options   The selected translation table is used to attach weights to each codon based on its  frequency  The codons are assigned randomly with a probability given by the weights  A  more frequent codon has a higher probability of being selected  Every time you perform  the analysis  you will get a different result  This option yields a result that is closer to the    CHAPTER 15  PROTEIN ANALYSES 245    translation behavior of the organism  assuming y
486. rameters for distance based methods                       Figure 20 2 shows the parameters that can be set for the distance based methods     e Algorithms        The UPGMA method assumes that evolution has occurred at a constant rate in the  different lineages  This means that a root of the tree is also estimated         The neighbor joining method builds a tree where the evolutionary rates are free to differ  in different lineages  CLC DNA Workbench always draws trees with roots for practical  reasons  but with the neighbor joining method  no particular biological hypothesis is  postulated by the placement of the root  Figure 20 3 shows the difference between  the two methods     CHAPTER 20  PHYLOGENETIC TREES 368    e To evaluate the reliability of the inferred trees  CLC DNA Workbench allows the option of  doing a bootstrap analysis  A bootstrap value will be attached to each branch  and this  value is a measure of the confidence in this branch  The number of replicates in the  bootstrap analysis can be adjusted in the wizard  The default value is 100     For a more detailed explanation  see  Bioinformatics explained  in section 20 2     Arabidopsis thaliana  Arabidopsis thaliana  Saccharomyces cerevisiae   Schizosaccharomyces pombe              sco7  Mus musculus  Bos taurus  Homo sapiens    soot Mus musculus  Bos taurus  Homo sapiens  Saccharomyces cerevisiae  Schizosaccharomyces pombe  Arabidopsis thaliana  Arabidopsis thaliana    Figure 20 3  Method choices for phyloge
487. ransform  nor build upon this work     CHAPTER 13  GENERAL SEQUENCE ANALYSES 208       Figure 13 11  The dot plot showing a inversion in a sequence  See also figure 13 8     SOME RIGHTS RESERVED    See http   creativecommons org licenses by nc nd 2 5  for more information on  how to use the contents     13 2 4 Bioinformatics explained  Scoring matrices    Biological sequences have evolved throughout time and evolution has shown that not all changes  to a biological sequence is equally likely to happen  Certain amino acid substitutions  change of  one amino acid to another  happen often  whereas other substitutions are very rare  For instance   tryptophan  W  which is a relatively rare amino acid  will only     on very rare occasions     mutate  into a leucine  L      Based on evolution of proteins it became apparent that these changes or substitutions of amino  acids can be modeled by a scoring matrix also refereed to as a substitution matrix  See an  example of a scoring matrix in table 13 1  This matrix lists the substitution scores of every  single amino acid  A score for an aligned amino acid pair is found at the intersection of the  corresponding column and row  For example  the substitution score from an arginine  R  to  a lysine  K  is 2  The diagonal show scores for amino acids which have not changed  Most  substitutions changes have a negative score  Only rounded numbers are found in this matrix     The two most used matrices are the BLOSUM  Henikoff and Henikoff  19
488. rch report etc  is saved where it is dropped  If the element already exists  you are asked  whether you want to save a copy  You drag from the View Area by dragging the tab of the  desired element     CHAPTER 3  USER INTERFACE 82    Use of drag and drop is supported throughout the program  also to open and re arrange views   see section 3 2 6      Note that if you move data between locations  the original data is kept  This means that you are  essentially doing a copy instead of a move operation     Copy using drag and drop  To copy instead of move using drag and drop  hold the Ctrl    on Mac  key while dragging     click the element   click on the element again  and hold left mouse button   drag  the element to the desired location   press Ctrl  38 on Mac  while you let go of  mouse button release the Ctrl 3   button    3 1 6 Change element names    This section describes two ways of changing the names of sequences in the Navigation Area  In  the first part  the sequences themselves are not changed   it   s their representation that changes   The second part describes how to change the name of the element     Change how sequences are displayed    Sequence elements can be displayed in the Navigation Area with different types of information     e Name  this is the default information to be shown      e Accession  Sequences downloaded from databases like GenBank have an accession  number      e Latin name   e Latin name  accession    e Common name     e Common name  accession  
489. re 18 31      Sorting  Aya ee    Figure 18 31  Buttons to sort restriction enzymes     e Sort enzymes alphabetically  Aa   Clicking this button will sort the list of enzymes  alphabetically     e Sort enzymes by number of restriction sites  li   This will divide the enzymes into four  groups         Non cutters     Single cutters     Double cutters     Multiple cutters     There is a checkbox for each group which can be used to hide   show all the enzymes in a  group      amp   e Sort enzymes by overhang  T 7   This will divide the enzymes into three groups         Blunt  Enzymes cutting both strands at the same position       3     Enzymes producing an overhang at the 3    end     5     Enzymes producing an overhang at the 5    end     There is a checkbox for each group which can be used to hide   show all the enzymes in a  group     CHAPTER 18  CLONING AND CUTTING 330    Manage enzymes    The list of restriction enzymes contains per default 20 of the most popular enzymes  but you can  easily modify this list and add more enzymes by clicking the Manage enzymes button  This will  display the dialog shown in figure 18 32        3    E  q Manage enzymes    ff  1  Please choose enzymes Bj    Enzyme list        Use existing enzyme list Popular enzymes v   io      Enzymes in  Popular en     Enzymes shown in Side Panel  Filter  Filter     Name Overhang Methylation Popula      Name Overhang Methylation Popula         BamHI 5    gatc N4 methy    tt a  EcoRI 5   aatt N6   methy       
490. re adjusted along the left and right edges of the view     10 2 1 Using split views to see details of the circular molecule    In order to see the nucleotides of a circular molecule you can open a new view displaying a  circular view of the molecule     Press and hold the Ctrl button  36 on Mac    click Show Sequence  zz  at the  bottom of the view    This will open a linear view of the sequence below the circular view  When you zoom in on the  linear view you can see the residues as shown in figure 10 5     Note  If you make a selection in one of the views  the other view will also make the corresponding  selection  providing an easy way for you to focus on the same region in both views     10 2 2 Mark molecule as circular and specify starting point    You can mark a DNA molecule as circular by right clicking its name in either the sequence view or  the circular view  In the right click menu you can alSo make a circular molecule linear  A circular  molecule displayed in the normal sequence view  will have the sequence ends marked with a        CHAPTER 10  VIEWING AND EDITING SEQUENCES 152    O pBR322         gt     bla  bl           pBR322 1000      4361 bp  tdB 02 7946    AGP pBR322 O       2 40  amp   l    ee eee    tet  et    60 80         pBR322 AGTTTATCACAGTTAAATTGCTAACGCAGTCAGGCACCGTGTA z  m o B MEE ge    Figure 10 5  Two views showing the same sequence  The bottom view is zoomed in     The starting point of a circular sequence can be changed by     make a selection st
491. re to export to   select   File of type      enter name of file   Save    When exporting to CSV and tab delimited files  decimal numbers are formatted according to the  Locale setting of the Workbench  see section 5 1   If you open the CSV or tab delimited file  with spreadsheet software like Excel  you should make sure that both the Workbench and the  spreadsheet software are using the same Locale     Note  The Export dialog decides which types of files you are allowed to export into  depending  on what type of data you want to export  E g  protein sequences can be exported into GenBank   Fasta  Swiss Prot and CLC formats     Export of folders and multiple elements  The  zip file type can be used to export all kinds of files and is therefore especially useful in  these situations    e Export of one or more folders including all underlying elements and folders    e  f you want to export two or more elements into one file   Export of folders is similar to export of single files  Exporting multiple files  of different formats   is done in  zip format  This is how you export a folder     select the folder to export   Export  E2    choose where to export to   enter name    Save    CHAPTER 7  IMPORT EXPORT OF DATA AND GRAPHICS 123    You can export multiple files of the same type into formats other than ZIP   zip   E g  two DNA  sequences can be exported in GenBank format     select the two sequences by  lt Ctrl gt  click  36  click on Mac  or  lt Shift gt  click   Export   E     
492. resent  see figure 18 11         g Replace Selection with Sequence  ES       1  Choose Insert Sequence    Select sequence to be inserted        Name 7 Length Overhangs Compatible ends left Compatible ends right    GAC    GTC    CDNA4 TO  Vector  5078  pene Sees CTG    CAG    Aanl  AatI  Acc1131  Ac    Aanl  AatI  Acc1131  Ac       Aaal  Acol  AocII  Apal     Aaal  Acol  AocII  Apal           CTA   GGC  CCGGGAT    CCGCCGG    pcDNA3 atp8al_NotI_Apal 5426    AcclII  Agel  AhyI  Ama    AccIII  Agel  Ahyl  Ama             i           Figure 18 11  Select a sequence for insertion     The sequence that you have chosen to insert into will be marked with bold and the text  vector   iS appended to the sequence name  Note that this is completely unrelated to the vector concept  in the cloning work flow described in section 18 1 2     The list furthermore includes the length of the fragment  an indication of the overhangs  and a  list of enzymes that are compatible with this overhang  for the left and right ends  respectively    If not all the enzymes can be shown  place your mouse cursor on the enzymes  and a full list will  be shown in the tool tip     Select the sequence you wish to insert and click Next     This will show the dialog in figure 18 12         Si Replace Selection with Sequence  ES    1  Choose Insert Sequence    Adapt Insert Sequ Ence t Ss    2  Adapt Insert Sequence to z A 3  Vector Change insert orientation            Change orientation of pcDNA3 atp8a1_NotI_Apal Orie
493. rimer design Viewer Protein DNA RNA Main  Advanced primer design tools E T  Detailed primer and probe parameters E E  Graphical display of primers E E  Generation of primer design output E E  Support for Standard PCR a E  Support for Nested PCR E T  Support for TaqMan PCR   T  Support for Sequencing primers E T  Alignment based primer design o E  Alignment based TaqMan probedesign E T  Match primer with sequence   E  Ordering of primers E o  Advanced analysis of primer properties E i  Molecular cloning Viewer Protein DNA RNA Main  Advanced molecular cloning o H  Graphical display of in silico cloning E    Advanced sequence manipulation Li u  Virtual gel view Viewer Protein DNA RNA Main  Fully integrated virtual 1D DNA gel simulator   E    For a more detailed comparison  we refer to http   www clcbio com compare     380    Genomics  E    Genomics    Genomics  E    Appendix B    Graph preferences    This section explains the view settings of graphs  The Graph preferences at the top of the Side  Panel includes the following settings     e Lock axes  This will always show the axes even though the plot is zoomed to a detailed  level     e Frame  Shows a frame around the graph   e Show legends  Shows the data legends   e Tick type  Determine whether tick lines should be shown outside or inside the frame         Outside      Inside    e Tick lines at  Choosing Major ticks will show a grid behind the graph         None      Major ticks  e Horizontal axis range  Sets the range of the 
494. rinciple  that is  a  search is performed for the values of the free parameters in the model assumed that  results in the highest likelihood of the observed alignment  Felsenstein  1981   By ticking  the estimate substitution rate parameters box  maximum likelinood values of the free  parameters in the rate matrix describing the assumed substitution model are found  If the  Estimate topology box is selected  a search in the space of tree topologies for that which  best explains the alignment is performed  If left un ticked  the starting topology is kept  fixed at that of the starting tree  The Estimate Gamma distribution parameter is active  if rate variation has been included in the model and in this case allows estimation of the  Gamma distribution parameter to be switched on or off  If the box is left un ticked  the  value is fixed at that given in the Rate variation part  In the absence of rate variation  estimation of substitution parameters and branch lengths are carried out according to  the expectation maximization algorithm  Dempster et al   1977   With rate variation the  maximization algorithm is performed  The topology space is searched according to the  PHYML method  Guindon and Gascuel  2003   allowing efficient search and estimation of  large phylogenies  Branch lengths are given in terms of expected numbers of substitutions  per nucleotide site     20 1 2 Tree View Preferences    The Tree View preferences are these     e Text format  Changes the text format fo
495. rocess is to amplify the target sequence with primers  including so called attB sites  In the CLC DNA Workbench  you can add attB sites to a sequence  fragment in this way     Toolbox in the Menu Bar   Cloning and Restriction Sites  si    Gateway Cloning   H    Add attB Sites         This will open a dialog where you can select on or more sequences  Note that if your fragment is  part of a longer sequence  you need to extract it first  This can be done in two ways     e If the fragment is covered by an annotation  if you want to use e g  a CDS   simply right click  the annotation and Open Annotation in New View    CHAPTER 18  CLONING AND CUTTING 320    e Otherwise you can simply make a selection on the sequence  right click and Open Selection  in New View    In both cases  the selected part of the sequence will be copied and opened as a new sequence  which can be Saved        When you have selected your fragment s   click Next     This will allow you to choose which attB sites you wish to add to each end of the fragment as  shown in figure 18 15        R  E  Add att8 Sites   s    1  Select one or more H  Elliot  fragments    2  Select attB sites                            Figure 18 15  Selecting which attB sites to ada     The default option is to use the attB1 and attB2 sites  If you have selected several fragments  and wish to add different combinations of sites  you will have to run this tool once for each  combination     Click Next will give you options to extend the fra
496. rocesses  a dialog will ask if you are sure that  you want to close the program  Closing the program will stop the process  and it cannot be  restarted when you open the program again     CHAPTER 3  USER INTERFACE 94    3 4 2 Toolbox  The content of the Toolbox tab in the Toolbox corresponds to Toolbox in the Menu Bar     The Toolbox can be hidden  so that the Navigation Area is enlarged and thereby displays more  elements     View   Show Hide Toolbox    The tools in the toolbox can be accessed by double clicking or by dragging elements from the  Navigation Area to an item in the Toolbox     3 4 3 Status Bar    As can be seen from figure 3 1  the Status Bar is located at the bottom of the window  In the  left side of the bar is an indication of whether the computer is making calculations or whether it  is idle  The right side of the Status Bar indicates the range of the selection of a sequence   See  chapter 3 3 0 for more about the Selection mode button      3 5 Workspace    If you are working on a project and have arranged the views for this project  you can save this  arrangement using Workspaces  A Workspace remembers the way you have arranged the views   and you can switch between different workspaces     The Navigation Area always contains the same data across Workspaces  It is  however  possible  to open different folders in the different Workspaces  Consequently  the program allows you to  display different clusters of the data in separate Workspaces     All Workspace
497. rs  3 9  Network configuration  33  Network drive  shared BLAST database  186  Never show this dialog again  105  New   feature request  28   folder  80   folder  tutorial  37   sequence  162  New sequence    INDEX    create from a selection  149  Newick  file format  394  Next Generation Sequencing  376   nexus  file format  395  Nexus  file format  393  394  NGS  3 6   nhr  file format  395  NHR  file format  395  Non standard residues  144  Nucleotide   info  144   sequence databases  386  Nucleotides   UIPAC codes  398  Numbers on sequence  142    nwk  file format  395   nxs  file format  395     oa4  file format  395  Open  consensus sequence  353  from clipboard  119  Open reading frame determination  234  Open ended sequence  234  Order primers  2 5  379  ORF  234  Organism  160  Origins from  132  Overhang  of fragments from restriction digest  339  Overhang  find restriction enzymes based on   330  332  336  344     pa4  file format  395  Page heading  116  Page number  116  Page setup  115  Pairwise comparison  361  PAM  scoring matrices  208  Parameters   search  168  Partition function  379  Paste   text to create a new sequence  119  Paste copy  130  Pattern Discovery  219  Pattern discovery  379  Pattern Search  221  PCR primers  379    411    PCR  perform virtually  271   pdb  file format  395   seq  file format  395  PDB  file format  395   pdf format  export  126  Peak  call secondary  305  Peptide sequence databases  386  Percent identity  pairwise compariso
498. rse   RO ID O reverse   RD IDO reverse    LMT       Figure 2 48  Verification of the result  at the top a view of the whole BLAST result  At the bottom  the same view is zoomed in on exon 3 to show the amino acids     either do not know the name of the gene  or the genomic sequence is poorly annotated  In these  cases  the approach described in this tutorial can be very productive     2 9 2 BLAST for primer binding sites    You can adjust the BLAST parameters so it becomes possible to match short primer sequences  against a larger sequence  Then it is easy to examine whether already existing lab primers can  be reused for other purposes  or if the primers you designed are specific     apne   Sparse  Standard BLAST 11       Primer searah  beem  7  o 1006    These settings are shown in figure 2 49     2 9 3 Finding remote protein homologues    If you look for short identical peptide sequences in a database  the standard BLAST param   eters will have to be reconfigured  Using the parameters described below  you are likely  to be able to identify whether antigenic determinants will cross react to other proteins     CHAPTER 2  TUTORIALS 68    EB Local BLAST  sz   1  Select sequences of same      Setinputparameters yYWM    gt H stztt  type  2  Set program parameters  3  Set input parameters    Choose parameters       Low Complexity       Choose Filter              Mask lower case    Expect  1000  Word size  Z    No of processors  2  Match Mismatch  Match  1 Mismatch   3     v  Gap 
499. ry long molecule and mispriming is a  concern  consider extracting part of the sequence prior to designing primers     When both forward and reverse regions are defined  If both a forward and a reverse region are defined  primer pairs will be suggested by the algorithm     After pressing the Calculate button a dialog will appear  see figure 16 8         Calculation parameters    Chosen parameters  Maximum primer length  Minimum primer length  Maximum G C content  Minimum GJE content  Maximum melting temperature  Minimum melting temperature  Maximum self annealing  Maximum self end annealing  Maximum secondary structure  3 end must meet G C requirements  S  end must meet G C requirements    Primer combination parameters  Max percentage point difference in G C content  Max difference in melting temperatures within a primer pair  Max hydrogen bonds between pairs  Max hydrogen bonds between pair ends  Maximum length of amplicon   Mispriming parameters  Use mispriming as exclusion criteria     Exact match  Minimum number of base pairs required For a match    Number of consecutive base pairs required in 3 end       E Cancel       Help      Figure 16 8  Calculation dialog for PCR primers when two primer regions have been defined        Again  the top part of this dialog shows the parameter settings chosen in the Primer parameters  preference group which will be used by the design algorithm  The lower part again contains a  menu where the user can choose to include mispriming of both
500. s  Eisenberg et al   1984      Hopp Woods scale  Hopp and Woods developed their hydrophobicity scale for identification of  potentially antigenic sites in proteins  This scale is basically a hydrophilic index where apolar  residues have been assigned negative values  Antigenic sites are likely to be predicted when  using a window size of 7  Hopp and Woods  1983      Cornette scale  Cornette et al  computed an optimal hydrophobicity scale based on 28 published  scales  Cornette et al   1987   This optimized scale is also suitable for prediction of alpha helices  in proteins     Rose scale  The hydrophobicity scale by Rose et al  is correlated to the average area of buried  amino acids in globular proteins  Rose et al   1985   This results in a scale which is not showing  the helices of a protein  but rather the surface accessibility     Janin scale  This scale also provides information about the accessible and buried amino acid  residues of globular proteins  Janin  1979      Welling scale  Welling et al  used information on the relative occurrence of amino acids in  antigenic regions to make a scale which is useful for prediction of antigenic regions  This method  is better than the Hopp Woods scale of hydrophobicity which is also used to identify antigenic  regions     Kolaskar Tongaonkar  A semi empirical method for prediction of antigenic regions has been  developed  Kolaskar and Tongaonkar  1990   This method also includes information of surface  accessibility and flexibi
501. s a dialog  In Step 1 you can change  remove and add DNA and protein sequences   When the relevant sequences are selected  clicking Next takes you to Step 2  This step allows    you to adjust the window size from which the complexity plot is calculated  Default is set to 11  amino acids and the number should always be odd  The higher the number  the less volatile the    graph     Figure 13 14 shows an example of a local complexity plot     CHAPTER 13  GENERAL SEQUENCE ANALYSES 212    Complexity plot of CAA24102    0 38  0 96  0 54  0 32  0 50  0 88  0 86  0 84  0 82  0 80  0 78    Complexity    Local  complexity       5 10 15 20 25 30 35 40 45  Position    Figure 13 14  An example of a local complexity plot     Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish   The values of the complexity plot approaches 1 0 as the distribution of amino acids become  more complex     See section B in the appendix for information about the graph view     13 4 Sequence statistics    CLC DNA Workbench can produce an output with many relevant statistics for protein sequences   Some of the statistics are also relevant to produce for DNA sequences  Therefore  this section  deals with both types of statistics  The required steps for producing the statistics are the same     To create a statistic for the sequence  do the following     select sequence s    Toolbox in the Menu Bar   General Sequence Analyses     A     Create Sequence Statistics        
502. s are automatically saved when closing down CLC DNA Workbench  The next time  you run the program  the Workspaces are reopened exactly as you left them     Note  It is not possible to run more than one version of CLC DNA Workbench at a time  Use two  or more Workspaces instead     3 5 1 Create Workspace    When working with large amounts of data  it might be a good idea to split the work into two  or more Workspaces  As default the CLC DNA Workbench opens one Workspace  Additional  Workspaces are created in the following way     Workspace in the Menu Bar    Create Workspace   enter name of Workspace   OK    When the new Workspace is created  the heading of the program frame displays the name of  the new Workspace  Initially  the selected elements in the Navigation Area is collapsed and the  View Area is empty and ready to work with   See figure 3 18      3 5 2 Select Workspace    When there is more than one Workspace in the CLC DNA Workbench  there are two ways to  switch between them     CHAPTER 3  USER INTERFACE 95    9 CLC Dna Workbench 3 0  Current workspace  Default  SEE  File Edit Search View Toolbox Workspace Help    DE GS Sd DD E E T ONA    Show New Import Expor Workspace Search Pan EOC Zoom In Zoom Out       JS    UE    3     Example data  iH Nucleotide  H  Protein  w  Extra      README       R Recycle bin  1                t Alignments and Trees   KA General Sequence Analyses  A Nucleotide Analyses   gas Protein Analyses    pf  Sequencing Data Analyses  Cal Primers 
503. s directly to the dialog described in the next section     CHAPTER 19  SEQUENCE ALIGNMENT 362       a  BB Create Pairwise Comparison EA     1  Select alignments of    Select alignments of same typ  same type Projects  Selected Elements   1              Data PE ATP8a1 ortholog alignment   xample Data        Cloning      Primers   P Protein analyses   gt  Protein orthologs                RNA secondary structure  Fj Sequencing data             Th         Qr zenter search term gt                      Figure 19 13  Creating a pairwise comparison table     19 5 2 Pairwise comparison parameters    There are four kinds of comparison that can be made between the sequences in the alignment   as shown in figure 19 14        a  I Create Pairwise Comparison EJ    1  Select alignments of    select comparisons to p TL    same type       2  Select comparisons to  perform    Select comparisons  J  Gaps  4  Differences      Distance  4  Similarity     4  Identities     caiam  Dao      previous    pu J   Jin J   Kca    Figure 19 14  Adjusting parameters for pairwise comparison                       e Gaps Calculates the number of alignment positions where one sequence has a gap and the  other does not     e Identities Calculates the number of identical alignment positions to overlapping alignment  positions between the two sequences     e Differences Calculates the number of alignment positions where one sequence is different  from the other  This includes gap differences as in the Gaps compari
504. s from 230 to 240 inclusive and 250 to 260 inclusive     10 2 Circular DNA    A sequence can be shown as a circular molecule   select a sequence in the Navigation Area   Show in the Toolbar   As Circular  Q     or If the sequence is already open   Click Show As Circular      at the lower left part  of the view    This will open a view of the molecule similar to the one in figure 10 4     CHAPTER 10  VIEWING AND EDITING SEQUENCES 151    TR              1000           pBR322  4361 bp     S protein    Figure 10 4  A molecule shown in a circular view     This view of the sequence shares some of the properties of the linear view of sequences as  described in section 10 1  but there are some differences  The similarities and differences are  listed below     e Similarities         The editing options       Options for adding  editing and removing annotations       Restriction Sites  Annotation Types  Find and Text Format preferences groups     e Differences         In the Sequence Layout preferences  only the following options are available in the  circular view  Numbers on plus strand  Numbers on sequence and Sequence label         You cannot zoom in to see the residues in the circular molecule  If you wish to see  these details  split the view with a linear view of the sequence        In the Annotation Layout  you also have the option of showing the labels as Stacked   This means that there are no overlapping labels and that all labels of both annotations  and restriction sites a
505. s in this  solution are highlighted on the sequence     16 4 1 Saving primers    Primer solutions in a table row can be saved by selecting the row and using the right click mouse  menu  This opens a dialog that allows the user to save the primers to the desired location   Primers and probes are saved as DNA sequences in the program  This means that all available  DNA analyzes can be performed on the saved primers  including BLAST  Furthermore  the primers  can be edited using the standard sequence view to introduce e g  mutations and restriction sites     16 4 2 Saving PCR fragments    The PCR fragment generated from the primer pair in a given table row can also be saved by  selecting the row and using the right click mouse menu  This opens a dialog that allows the user  to save the fragment to the desired location  The fragment is saved as a DNA sequence and the  position of the primers is added as annotation on the sequence  The fragment can then be used  for further analysis and included in e g  an in silico cloning experiment using the cloning editor     16 4 3 Adding primer binding annotation    You can add an annotation to the template sequence specifying the binding site of the primer   Right click the primer in the table and select Mark primer annotation on sequence     16 5 Standard PCR    This mode is used to design primers for a PCR amplification of a single DNA fragment     CHAPTER 16  PRIMERS 257    16 5 1 User input    In this mode the user must define either a 
506. s is done in the Toolbox in the Processes tab     11 1 3 Save GenBank search parameters    The search view can be saved either using dragging the search tab and and dropping it in the  Navigation Area or by clicking Save  ED  When saving the search  only the parameters are saved    not the results of the search  This is useful if you have a special search that you perform from  time to time     Even if you don   t save the search  the next time you open the search view  it will remember the  parameters from the last time you did a search     11 2 Sequence web info    CLC DNA Workbench provides direct access to web based search in various databases and on the  Internet using your computer   s default browser  You can look up a sequence in the databases of  NCBI and UniProt  search for a sequence on the Internet using Google and search for Pubmed    CHAPTER 11  ONLINE DATABASE SEARCH 1 1    references at NCBI  This is useful for quickly obtaining updated and additional information about  a sequence     The functionality of these search functions depends on the information that the sequence  contains  You can see this information by viewing the sequence as text  see section 10 5   In  the following sections  we will explain this in further detail     The procedure for searching is identical for all four search options  see also figure 11 3      Open a sequence or a sequence list   Right click the name of the sequence   Web  Info  p    select the desired search function    20    
507. se  the other option instead     Direct download  Selecting the first option takes you to the dialog shown in figure 1 8     A progress for getting the license is shown  and when the license is downloaded  you will be able  to click Next     Go to license download web page    Selecting the second option  Go to license download web page  opens the license web page as  shown in 1 9     Click the Request Evaluation License button  and you will be able to save the license on your  computer  e g  on the Desktop     Back in the Workbench window  you will now see the dialog shown in 1 10     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 20       License Wizard s       d CLC DNA Workbench    Requesting a license with id  CLC LICENSE SRENMNSTED 0D43CA9       Requesting and downloading a license by establishing a direct connection to the CLC bio License Web Service     Your License was successfully downloaded  The License is valid until  2008 08 01    If you experience any problems  please contact The CLC Support Team            Proxy Settings   Previou  Next    Quit Workbench         Figure 1 8  A license has been downloaded     Download a license    This  your    License Order ID  CLC LICENSE SRENMNSTED 0D43CASEDF 4XXXXXD844A 4COC 4BXXXXX    1 AB1AEFF9 19F    our license  please dick the button below     Download License  adec    a file containing the license willl b       Figure 1 9  The license web page where you can download a license        License Wizard os      d CLC DNA Workben
508. search results    The search result is presented as a list of links to the files in the NCBI database  The View  displays 50 hits at a time  This can be changed in the Preferences  see chapter 5   More hits  can be displayed by clicking the More    button at the bottom right of the View     Each sequence hit is represented by text in three columns     e Accession   e Description     Modification date     e Length     It is possible to exclude one or more of these columns by adjust the View preferences for the  database search view  Furthermore  your changes in the View preferences can be saved  See  section 5 6     Several sequences can be selected  and by clicking the buttons in the bottom of the search view   you can do the following     e Download and open  doesn   t save the sequence   e Download and save  lets you choose location for saving sequence     e Open at NCBI  searches the sequence at NCBI   s web page     Double clicking a hit will download and open the sequence  The hits can also be copied into the  View Area or the Navigation Area from the search results by drag and drop  copy paste or by  using the right click menu as described below     Drag and drop from GenBank search results    The sequences from the search results can be opened by dragging them into a position in the  View Area     Note  A sequence is not saved until the View displaying the sequence is closed  When that  happens  a dialog opens  Save changes of sequence x   Yes or No      The sequence c
509. sembly  303  Vector  see cloning  307  Vector contamination  find automatically  289  Vector design  307  Vector graphics  export  126  VectorNTI  file format  393  View  84  alignment  353  dot plots  203  GenBank format  101  preferences  90  save changes  87  sequence  141  sequence as text  161  View Area  84  illustration      View preferences  106  show automatically  106  style sheet  110  View settings  user defined  106  Virtual gel  380   vsf  file format for settings  107    Web page  import sequence from  119  Wildcard  append to search  168    415    Windows installation  12  Workspace  94   create  94   delete  95   save  94   select  94  Wrap sequences  142    xIs  file format  395  xIsx  file format  395  xml  file format  395    Zip  file format  393 395  Zoom  91  tutorial  39  Zoom In  91  Zoom Out  91  Zoom to 100    92    
510. sequence   XxX ATP8al mRNA  reference  pa     Reference sequence     Include reference sequence in contig s   Only include part of the reference sequence in the contig  Extra residues   100  Do not include reference sequence in contig s   Conflicts resolved with     prejme      Previous    next   Finish    1 06 Cancel                        Figure 2 14  The  ATP8a1 mRNA  reference   sequence selected as reference sequence for the  assembly     Click Next and choose to use the trim information  that you have just added      CHAPTER 2  TUTORIALS 4     Click Next and choose to Save your results  The next step will ask you for a location to save the  results to  You can just accept the default location  or you could use the left hand icon under the   Save in folder  heading to create a new folder to save your assembly into     Click Finish and the assembly process will begins     2 5 3 Getting an overview of the contig    The result of the assembly is a Contig which is an alignment of the nine reads to the reference  sequence  Click Fit width  W  to see an overview of the contig  To help you determine the  coverage  display a coverage graph  see figure 2 15      Alignment info in Side Panel   Coverage   Graph    Atp8a1 NERI  trt nd cal     A bly   t  ATP 881 MRNA  reference     ee eo E  ather sequences at top       f      Show sequence ends  Conflict 2      Conflict   Find Inconsistency     Conflict Low coverage threshold 8  Conflict  Conflict    Consensus  3       gt  Annotation
511. sequence  The annotations are placed above the sequence         Separate layer  The annotations are placed above the sequence and above restriction  sites  only applicable for nucleotide sequences      CHAPTER 10  VIEWING AND EDITING SEQUENCES    SEQUENCE SELUNOS    gA e rh    k Sequence layout     Annotation layout   4  Show annotations    Position  Next to sequence      Offset Little offset        Label    Stacked        Show arrows     Use gradients     Annotation types  EM RI cos       exon     E  4  Gene     source      Goss       Select All      Deselect All        Restriction sites         Residue coloring  Nucleotide info  k Find    k Text Format    Figure 10 7  Changing the layout of annotations in the Side Panel        154    e Offset  If several annotations cover the same part of a sequence  they can be spread out         Piled  The annotations are piled on top of each other  Only the one at front is visible       Little offset  The annotations are piled on top of each other  but they have been offset    a little         More offset  Same as above  but with more spreading         Most offset  The annotations are placed above each other with a little space between     This can take up a lot of Space on the screen     e Label  The name of the annotation can shown as a label  Additional information about the    sequence is shown if you place the mouse cursor on the annotation and keep it still         No labels  No labels are displayed     On annotation  The labels ar
512. sequence s    Toolbox in the Menu Bar   General Sequence    Analyses  7    Pattern Discovery  42     or right click DNA or protein sequence s    Toolbox   General Sequence Analyses  GA     Pattern Discovery  4      If a sequence was selected before choosing the Toolbox action  the sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add or remove sequences or  sequence lists from the selected elements     You can perform the analysis on several DNA or several protein sequences at a time  If the  analysis is performed on several sequences at a time the method will search for patterns which  is common between all the sequences  Annotations will be added to all the sequences and a  view is opened for each sequence     Click Next to adjust parameters  see figure 13 20         E  q Pattern Discovery LES    1  Select one or more  sequences of same type    2  Set parameters    Define model     Create and search with new model    Use existing model o    Set motif parameters  Pattern length   Min    4  Max   9  Noise      1 v    Number of patterns to predict  1 w                  2   q        Previous    gt  Next    Enshi    XX Cancel      Figure 13 20  Setting parameters for the pattern discovery  See text for details              In order to search unknown sequences with an already existing model     Select to use an already existing model which is seen in figure 13 20  Models are represented  with the following icon in the navigation area  HAR   
513. signing process  Then follows instructions on how to adjust parameters for primers   how to inspect and interpret primer properties graphically and how to interpret  save and analyze  the output of the primer design analysis  After a description of the different reaction types for  which primers can be designed  the chapter closes with sections on how to match primers with  other sequences and how to create a primer order     16 1 Primer design   an introduction  Primer design can be accessed in two ways     select sequence   Toolbox in the Menu Bar   Primers and Probes  E1    Design  Primers  TZ    OK    or right click sequence   Show   Primer  7     In the primer view  see figure 16 1   the basic options for viewing the template sequence are the  same as for the standard sequence view  See section 10 1 for an explanation of these options     Note  This means that annotations such as e g  known SNP   s or exons can be displayed on the  template sequence to guide the choice of primer regions  Also  traces in Sequencing reads can  be shown along with the structure to guide e g  the re sequencing of poorly resolved regions        Tor PERH3BC Es  20   AMBER VESIONeEr DECIS     Ds EO    he     e    E    PERH3BC GTGAGTCTGATGGGTCTGCCCATGGTTTCC   F  Lat  18   Primer parameters     gr  Length  Lgt  19 Max  22    Min  18   F  L  t  20 Melt  temp  20   Max  58 2    Lgt  21 Min  48    Inner Melk  temp  20   Lgt  22 Max  625   Min  40 60 k Advanced parameters      Mode  PERH3SBC CCTCTAGT
514. so generate tables     In addition to the Open and Save options you can also choose whether the result of the analysis  should be added as annotations on the sequence or shown on a table  If both options are  selected  you will be able to click the results in the table and the corresponding region on the  sequence will be selected     If you choose to add annotations to the sequence  they can be removed afterwards by clicking  Undo       in the Toolbar     9 2 2 Batch log    For some analyses  there is an extra option in the final step to create a log of the batch process   see e g  figure 9 7   This log will be created in the beginning of the process and continually  updated with information about the results  See an example of a log in figure 9 8  In this  example  the log displays information about how many open reading frames were found     CHAPTER 9  BATCHING AND RESULT HANDLING 139                                                    EB log     Rows  9 Log Filter     Name Description Type Time   4  738615 Found 10 reading frames Fri Nov 17     HUMDINUC Found 5 reading frames   Fri Nov 17     PERHIBA Found 5 reading frames Fri Nov 17        PERHIBB Found 7 reading Frames NE Fri Nov 17     PERHZBA Found 4 reading frames Fri Nov 17     PERH2BB Found 7 reading Frames Fri Nov 17     PERH2BD Found 8 reading Frames Fri Nov 17        PERH3BA Found 3 reading Frames Fri Nov 17     PERH3BC Found 7 reading frames Fri Nov 17                               Figure 9 8  An example of a ba
515. son     e Distance Calculates the Jukes Cantor distance between the two sequences  This number  is given as the Jukes Cantor correction of the proportion between identical and overlapping  alignment positions between the two sequences     e Percent identity Calculates the percentage of identical residues in alignment positions to  overlapping alignment positions between the two sequences     CHAPTER 19  SEQUENCE ALIGNMENT    Click Next if you wish to adjust how to handle the results  see section 9 2   If not  click Finish     19 5 3 The pairwise comparison table    The table shows the results of selected comparisons  see an example in figure 19 15   Since  comparisons are often symmetric  the table can show the results of two comparisons at the    same time  one in the upper right and one in the lower left triangle        EE TDPZ BOMMD al        TOP2 BOMMO  TOP2 DROME  TOP2 PEA  TOP2 ARATH  TOP2 PLAFK  TOP2 CANGA  TOP2 YEAST  TOP2 CANAL  TOP2 SCHPO  TOP2 LEICH  TOP2 CRIFA  TOP2_TRYBB  TOP2_TRYCR  TOP2 ASFM2  TOP2 ASFB7    ES Op                         E n  os  voa  eu or  o      a  sm ais  we  ral so  aval o    1145  1089  1109 1124    3 m   ae   1157  1084  1113  1136 1161  413 396         Figure 19 15  A pairwise comparison table     The following settings are present in the side panel     e Contents        Upper comparison  Selects the comparison to show in the upper triangle of the table      Upper comparison gradient  Selects the color gradient to use for the upper trian
516. sources available are PFAM databases  for use with CLC Protein Workbench  and CLC Main Workbench      Because procedures for downloading  installation  uninstallation and updating are the same as  for plug ins see section 1 7 1 and section 1 7 2 for more information     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 33    Updates available    nip CLC Plugins    Updates are available for your plug ins and or resources     Use the list below to select which updates you would like to install  IF you prefer you can install the  updates manually through the plugin and resource manager     Additional Alignments  Version 1 03    Size  12 5 MB    Updated bo At new versions of the CLC Workbenches     Figure 1 26  Plug in updates        Manage Plug ins and Resources    Manage Plug ins Download Plug ins Manage Resources Download Resources   PFAM 100 A    Version 1 01  Top 100 occuring protein domains G PF AM 100  Size  5 MB Download and Install 3   Version 1 0   PFAM 500 D inti   Version 1 0 sop  Top 500 occuring protein domains   PFAM Full   Version 1 0    Complete PFAM database                      Mi          Figure 1 27  Resources available for download     1 8 Network configuration    If you use a proxy server to access the Internet you must configure CLC DNA Workbench to use  this  Otherwise you will not be able to perform any online activities  e g  searching GenBank    CLC DNA Workbench supports the use of a HTTP proxy and an anonymous SOCKS proxy     To configure your proxy setti
517. ss Prot TrEMBL  377   swp  file format  395  System requirements  15    Tab delimited  file format  395    414    Tab  file format  393  Table of fragments  339  Tabs  use of  84  Tag based expression profiling  3 0  Tags  insert into sequence  318  TaqMan primers  3 9  tar  file format  395  Tar  file format  395  Taxonomy  batch edit  84  tBLASTn  176  tBLASTx  175  Terminated processes  93  Text format  148  user manual  35  view sequence  161  Text  file format  395  tif format  export  126  Tips for BLAST searches  64  Toolbar  illustration      preferences  106  Toolbox  93  94  illustration      show hide  94  Topology layout  trees  3 0  Trace colors  144  Trace data  2 8  3 6    quality  289  Traces   scale  2 8  Translate    a selection  145   along DNA sequence  144   annotation to protein  149   CDS  234   coding regions  234   DNA to RNA  229   nucleotide sequence  232   ORF  234   protein  243   RNA to DNA  230   to DNA  378   to protein  232  3 8  Translation   of a selection  145   show together with DNA sequence  144  Transmembrane helix prediction  3 8    INDEX    Trim  288  3 76  Trimmed regions   adjust manually  296  TSV  file format  393  Tutorial   Getting started  3   txt  file format  395    UIPAC codes   amino acids  396  Undo limit  105  Undo Redo  87  UniProt   search  377   search sequence in  1 1  UniVec  trimming  289  UPGMA algorithm  3 2  379  Urls  Navigation Area  124  User defined view settings  106  User interface        Variance table  as
518. stimate of primate phylogeny  Philos Trans R Soc  Lond B Biol Sci  348 1326  405 421      Rose et al   1985  Rose  G  D   Geselowitz  A  R   Lesser  G  J   Lee  R  H   and Zehfus  M  H    1985   Hydrophobicity of amino acid residues in globular proteins  Science  229 4 16  834   838      Saitou and Nei  1987  Saitou  N  and Nei  M   1987   The neighbor joining method  a new  method for reconstructing phylogenetic trees  Mol Biol Evol  4 4  406 425      SantaLucia  1998  SantaLucia  J   1998   A unified view of polymer  dumbbell  and oligonu   cleotide DNA nearest neighbor thermodynamics  Proc Natl Acad Sci U S A  95 4  1460 1465     BIBLIOGRAPHY 403     Schneider and Stephens  1990  Schneider  T  D  and Stephens  R  M   1990   Sequence logos   a new way to display consensus sequences  Nucleic Acids Res  18 20  609  7 6100      Siepel and Haussler  2004  Siepel  A  and Haussler  D   2004   Combining phylogenetic and  hidden Markov models in biosequence analysis  J Comput Biol  11 2 3  413 428      Smith and Waterman  1981  Smith  T  F  and Waterman  M  S   1981   Identification of common  molecular subsequences  J Mol Biol  147 1  195 197      Sneath and Sokal  1973  Sneath  P  and Sokal  R   1973   Numerical Taxonomy  Freeman  San  Francisco      Tobias et al   1991  Tobias  J  W   Shrader  T  E   Rocap  G   and Varshavsky  A   1991   The  N end rule in bacteria  Science  254 5036  13 74 1377      von Ahsen et al   2001  von Ahsen  N   Wittwer  C  T   and Schutz  E   2001   O
519. t Secondary structure   Show column  Sequence  Melk  temp      Self annealing    TSGTTTCCTTOCTCT AGT    TGATCTCCTICCTITGGT c A Self annealing alignmen     Self end annealing     GC content        Secondary structure sc       Secondary structure    Figure 16 16  Properties of a primer from the Example Data     In the Side Panel you can specify the information to display about the primer  The information  parameters of the primer properties table are explained in section 16 5 2     16 11 Find binding sites and create fragments    In CLC DNA Workbench you have the possibility of matching known primers against one or more  DNA sequences or a list of DNA sequences  This can be applied to test whether a primer used in  a previous experiment is applicable to amplify e g  a homologous region in another species  or  to test for potential mispriming  This functionality can also be used to extract the resulting PCR  product when two primers are matched  This is particularly useful if your primers have extensions  in the 5    end     To search for primer binding sites     os        Toolbox   Primers and Probes  E    Find Binding Sites and Create Fragments  72     If a sequence was already selected  this sequence is now listed in the Selected Elements window  of the dialog  Use the arrows to add or remove sequences or sequence lists from the selected  elements     Click Next when all the sequence have been added     Note  You should not add the primer sequences at this step     16 11 1 Bi
520. t can be edited by   right click the selection   Edit Selection  2     A dialog appears displaying the sequence  You can add  remove or change the text and click  OK  The original selected part of the sequence is now replaced by the sequence entered in the  dialog  This dialog also allows you to paste text into the sequence using Ctrl   V  3   V on Mac      If you delete the text in the dialog and press OK  the selected text on the sequence will also be  deleted  Another way to delete a part of the sequence is to     right click the selection   Delete Selection        If you wish to only correct only one residue  this is possible by simply making the selection  only cover one residue and then type the new residue  Another way to edit the sequence is by  inserting a restriction site  See section 18 1 4     10 1 5 Sequence region types    The various annotations on sequences cover parts of the sequence  Some cover an interval   some cover intervals with unknown endpoints  some cover more than one interval etc  In the    CHAPTER 10  VIEWING AND EDITING SEQUENCES 150    following  all of these will be referred to as regions  Regions are generally illustrated by markings   often arrows  on the sequences  An arrow pointing to the right indicates that the corresponding  region is located on the positive strand of the sequence  Figure 10 2 is an example of three  regions with separate colors     Ses mp       Figure 10 2  Three regions on a human beta globin DNA sequence  HUMHBB      
521. t click in the empty white area of the contig   Reassemble contig      This opens a dialog as shown in figure 17 25  In this dialog  you can choose   e De novo assembly  This will perform a normal assembly in the same way as if you had    selected the reads as individual sequences  When you click Next  you will follow the same  steps as described in section 17 4  The consensus sequence of the contig will be ignored     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 305       q Reassemble Contig Es        1  Select a single contig     SEIECE assembly aigoritnim    2  Select assembly algorithm    Assembly algorithm    De novo assembl  Reassemble reads and ignore old consensus sequence    Reference assembly  Reassemble reads using old consensus sequence as reference             a  Cem deed    Ye kee                Figure 17 25  Re assembling a contig     e Reference assembly  This will use the consensus sequence of the contig as reference   When you click Next  you will follow the same steps as described in section 17 5     When you click Finish  a new contig is created  so you do not lose the information in the old  contig     17 9 Secondary peak calling    CLC DNA Workbench is able to detect secondary peaks   a peak within a peak   to help discover  heterozygous mutations  Looking at the height of the peak below the top peak  the CLC DNA  Workbench considers all positions in a sequence  and if a peak is higher than the threshold set  by the user  it will be    called       
522. t features of CLC Protein  Workbench  and it has additional advanced features  CLC Main Workbench holds all basic and  advanced features of the CLC Workbenches     In June 2007  CLC RNA Workbench was released as a sister product of CLC Protein Workbench  and CLC DNA Workbench  CLC Main Workbench now also includes all the features of CLC RNA  Workbench     In March 2008  the CLC Free Workbench changed name to CLC Sequence Viewer     In June 2008  the first version of the CLC Genomics Workbench was released due to an  extraordinary demand for software capable of handling sequencing data from the new high   throughput sequencing systems like 454  Illumina Genome Analyzer and SOLID     For an overview of which features all the applications include  see http    www clcbio com   features     In December 2006  CLC bio released a Software Developer Kit which makes it possible for  anybody with a knowledge of programming in Java to develop plug ins  The plug ins are fully  integrated with the CLC Workbenches and the Viewer and provide an easy way to customize and  extend their functionalities     All our software will be improved continuously  If you are interested in receiving news about  updates  you should register your e mail and contact data on http    www clcbio com  if you  haven t already registered when you downloaded the program     1 5 1 New program feature request    The CLC team is continuously improving the CLC DNA Workbench with our users    interests in  mind  Therefor
523. t how to handle the results  see section 9 2   If not  click Finish   An example of protein sequence statistics is shown in figure 13 16     1 Protein statistics    1 1 Sequence information    haemoglobin beta h0 chain  Mus musculus      1 2 Half life  Half life mammals Half life yeast Half life E Coli    Figure 13 16  Comparative sequence statistics            Nucleotide sequence statistics are generated using the same dialog as used for protein Sequence  statistics  However  the output of Nucleotide sequence statistics is less extensive than that of  the protein sequence statistics     Note  The headings of the tables change depending on whether you calculate    individual    or     comparative    sequence statistics     The output of comparative protein sequence statistics include     e Sequence information         Sequence type    Length        Organism    CHAPTER 13  GENERAL SEQUENCE ANALYSES 214        Name      Description      Modification Date      Weight  This is calculated like this  swimunitsinsequence wetght unit       links x  weight H20  where links is the sequence length minus one and units are  amino acids  The atomic composition is defined the same way         Isoelectric point      Aliphatic index    e Half life   e Extinction coefficient   e Counts of Atoms   e Frequency of Atoms   e Count of hydrophobic and hydrophilic residues  e Frequencies of hydrophobic and hydrophilic residues  e Count of charged residues   e Frequencies of charged residues   e Amino
524. t of all qualifiers of all the selected  annotations is shown  Note that if one of the annotations do not have the qualifier you have  chosen  it will not be retyped  If an annotation has multiple qualifiers of the same type  the  first is used for the new type     e New type  You can select from a list of all the pre defined types as well as enter your own  annotation type  All the selected annotations will then get this type     CHAPTER 10  VIEWING AND EDITING SEQUENCES 160    Options  O  Use this qualifier  if exists        New type    O Use annotation name as type    Figure 10 12  The Advanced Retype dialog        e Use annotation name as type  The annotation   s name will be used as type  e g  if you have  an annotation named  Promoter   it will get  Promoter  as its type by using this option      10 3 4 Removing annotations    Annotations can be hidden using the Annotation Types preferences in the Side Panel to the right  of the view  see section 10 3 1   In order to completely remove the annotation     right click the annotation   Delete   Delete Annotation  x   If you want to remove all annotations of one type     right click an annotation of the type you want to remove   Delete   Delete Annota   tions of Type  type     If you want to remove all annotations from a sequence   right click an annotation   Delete   Delete All Annotations  The removal of annotations can be undone using Ctrl   Z or Undo  3  in the Toolbar     If you have more sequences  e g  in a sequence l
525. ta is possible of you add a location on a network drive  The procedure is similar to  the one described above  When you add a location on a network drive or a removable drive  the    CHAPTER 3  USER INTERFACE 19       E  q Choose folder to add as location    Lookin    EE Desktop      2 BE    K    jm Computer  ari  amp   Network    Recent Items    Desktop       Documents    LS         Computer          A     Network    File name  C  Users smoensted Desktop   Add             Files of type    All Files       Cancel              m    IMavigalondrea  8      p BS Y  o Ofal E  o       a CLC  a  EA Deskta    Figure 3 5  The new location has been added     location will appear inactive when you are not connected  Once you connect to the drive again   click Update All  1   and it will become active  note that there will be a few seconds    delay from  you connect      Opening data  The elements in the Navigation Area are opened by    Double click the element    or Click the element   Show  42  in the Toolbar   Select the desired way to view the  element    This will open a view in the View Area  which is described in section 3 2     Adding data    Data can be added to the Navigation Area in a number of ways  Files can be imported from  the file system  see chapter 7   Furthermore  an element can be added by dragging it into the  Navigation Area  This could be views that are open  elements on lists  e g  search hits or  sequence lists  and files located on your computer  Finally  you ca
526. tart local alignments from these  initial matches     If you are interested in the bioinformatics behind BLAST  there is an easy to read explanation of  this in section 12 5     With CLC DNA Workbench there are two ways of performing BLAST searches  You can either  have the BLAST process run on NCBI   s BLAST servers  http    www ncbi nlm nih gov    or perform the BLAST search on your own computer  The advantage of running the BLAST search  on NCBI servers is that you have readily access to the most popular BLAST databases without  having to download them to your own computer  The advantage of running BLAST on your own  computer is that you can use your own Sequence data  and that this can sometimes be faster  and more reliable for big batch BLAST jobs    Figure 12 8 shows an example of a BLAST result in the CLC DNA Workbench      SL ATP8al BLAST                         AtpBai    ATPa  me a a lM nan IMM        2QUIATBA1 HUMAN   e Mm eee  NTI2 ATBA2_H  8198 ATBB2_H     sp Q9NTIZIATEA  HUMAN Probable phospholipicttransporting ATPase IB  ATPase class   2  ML 1            TF62 A18B4_H ae Se sity et 144  82   Gaps   291144  2   DA FUN e da ie i i  OG3 AT11B HUMAN           ee ee ee M  MARAT  1A HUMAN        me 4 4 Ve 1 e es e e   7  IB4S AT1I1C HUMAN    O M O I  I    l      UITA TEEI HONEN m fe ijt aa id   0423JAT8B3_HU MAN      SUATESA HUNAN  t t ine eet ee    2OCMIATOOO ULIBAAKI sv   aa i ele st hon ees  ek 6 E  4 Th p    BEE  Figure 12 1  Display of the output of a BLAST search
527. tch log when finding open reading frames     The log will either be saved with the results of the analysis or opened in a view with the results   depending on how you chose to handle the results     Part Ill    Bioinformatics    140    Chapter 10    Viewing and editing sequences    Contents    10 1 View sequence        0 08 ee eee ee ee ee a    10 1 1 Sequence settings in Side Panel              0 25502 2 eee  10 1 2 Restriction sites inthe Side Panel           0 0   004 0 wees  10 1 3 Selecting parts of the sequence           00 000 wenn nee  10 1 4 Editing the sequence        2    eee  10 1 5 Sequence regiontypes         0 0  0c e ee ee tee ee ee  LG 2 CMN  lt  cue tee oe wae ew CRBS ew eee ee ae  10 2 1 Using split views to see details of the circular molecule             10 2 2 Mark molecule as circular and specify starting point               10 3 Working with annotations          0 08 ee ee ee es  10 3 1 Viewing annotations         2    2    eee ee  10 3 2 Adding annotations  lt  ia sa ssa bw kw  amp  ew oe Fole ew a  10 3 3 Edit annotations 6 45 24 w aw aa we Oe eS ee we HE  10 3 4 Removing annotations      no aoao aoao e e a a a a    10 4 Element information     a   aoa soa a a 08 8 ee ee ee ee ee  10 5 View as text       0  0 ee a a ee annon  10 6 Creating a new sequence       0 08 08 eee eee eee    10 7 Sequence Lists   sve eee Cee A eRe ee ee RD we we    10 7 1 Graphical view of sequence lists         0 0    ee ee ee es  10 7 2 Sequence list table           a 0  ee
528. tched equally well another  place in the mapping  it is considered a non specific match  This color will    CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 299     overrule  the other colors  Note that if you are mapping with several reference  sequences  a read is considered a double match when it matches more than once  across all the contigs references  A non specific match is yellow per default     Beside from these preferences  all the functionalities of the alignment view are available  This  means that you can e g  add annotations  Such as SNP annotations  to regions of interest     However  some of the parameters from alignment views are set at a different default value in the  view of contigs  Trace data of the sequencing reads are shown if present  can be enabled and  disabled under the Nucleotide info preference group   and the Color different residues option is    also enabled in order to provide a better overview of conflicts  can be changed in the Alignment  info preference group      eens                                                                                                              Tii                                                                                                                                                                                   i                                                                                                                                                                                           
529. te that for contigs with more than 1000 reads  you can only do single residue replacements   you can   t delete or edit a selection   When the compactness is Packed  you cannot edit any of  the reads     17 7 3 Sorting reads    If you wish to change the order of the sequence reads  simply drag the label of the sequence up  and down  Note that this is not possible if you have chosen Gather sequences at top or set the  compactness to Packed in the Side Panel     You can also sort the reads by right clicking a sequence label and choose from the following  options   e Sort Reads by Alignment Start Position  This will list the first read in the alignment at the  top etc     e Sort Reads by Name  Sort the reads alphabetically   e Sort Reads by Length  The shortest reads will be listed at the top     17 7 4 Read conflicts    When the contig is created  conflicts between the reads are annotated on the consensus  sequence  The definition of a conflict is a position where at least one of the reads have a different  residue     A conflict can be in two states        e Conflict  Both the annotation and the corresponding row in the Table  FE       e Resolved  Both the annotation and the corresponding row in the Table  E  green     The conflict can be resolved by correcting the deviating residues in the reads as described above     A fast way of making all the reads reflect the consensus sequence is to select the position in  the consensus  right click the selection  and choose Transfer Se
530. ted Plasmir  ae pELCATS Tue Jun    smoensted Plasmii  HE pELCATS Tue Jun    smoensted Clonir    Pest46 DEVGGEALGF    Pest46 LLVVYPWT OF    PF68046 FFDSFGDLS  S   4      Move to Recycle Bin      Figure 3 7  Changing the common name of five sequences     Length    feed  7599    Mame    Description    Latin Mame       Taxonomy    Common Mame    Linear       art PEBDES  E   azt pasosa      art Peso46 E     IEEE tl         act Pas225 Ea    PF68225 VDEVGGEALI     P68225 RLLVVYPWT     PF68225 RFFESFGDL      Lu   EJES Il    Common Mame      gt     Erp  Oe    Him  Ei  Ela  Li     Mme  Li     El    85    Figure 3 8  A View Area can enclose several views  each view is indicated with a tab  see right view   which shows protein P68225   Furthermore  several views can be shown at the same time  in this    example  four views are    displayed      This chapter deals with the handling of views inside a View Area  Furthermore  it deals with    rearranging the views     Section 3 3 deals with the zooming and selecting functions     3 2 1 Open view    Opening a view can be done in a number of ways     double click an element in the Navigation Area    CHAPTER 3  USER INTERFACE 86    or select an element in the Navigation Area   File   Show   Select the desired way to  view the element    or select an element in the Navigation Area   Ctrl   O  3     B on Mac     Opening a view while another view is already open  will show the new view in front of the other  view  The view that was already open ca
531. ted to the name  of the exported file in order for the exported file to work     Before exporting  you are asked about which of the different settings you want to include in the  exported file  One of the items in the list is  User Defined View Settings     If you export this  only  the information about which of the settings is the default setting for each view is exported  If you  wish to export the Side Panel Settings themselves  see section 5 2 2     The process of importing preferences is similar to exporting     Press Ctrl   K  3     on Mac  to open Preferences   Import   Browse to and select  the  cpf file   Import and apply preferences    5 5 1 The different options for export and importing    To avoid confusion of the different import and export options  here is an overview     CHAPTER 5  USER PREFERENCES AND SETTINGS 110    e Import and export of bioinformatics data such as sequences  alignments etc   described  in section 7 1 1      e Graphics export of the views which creates image files in various formats  described in  section   3      e Import and export of Side Panel Settings as described in the next section     e Import and export of all the Preferences except the Side Panel settings  This is described  above     5 6 View settings for the Side Panel    The Side Panel is shown to the right of all views that are opened in CLC DNA Workbench  By  using the settings in the Side Panel you can specify how the layout and contents of the view   Figure 5 8 is an exampl
532. ter dye at the 5    end anda  quenching dye at the 3    end  Fluorescent molecules become excited when they are irradiated and  usually emit light  However  in a TaqMan probe the energy from the fluorescent dye is transferred  to the quencher dye by fluorescence resonance energy transfer as long as the quencher and the  dye are located in close proximity i e  when the probe is intact  TaqMan probes are designed  to anneal within a PCR product amplified by a standard PCR primer pair  If a TaqMan probe is  bound to a product template  the replication of this will cause the Taq polymerase to encounter  the probe  Upon doing so  the 5   exonuclease activity of the polymerase will cleave the probe   This cleavage separates the quencher and the dye  and as a result the reporter dye starts to  emit fluorescence     The TaqMan technology is used in Real Time quantitative PCR  Since the accumulation of  fluorescence mirrors the accumulation of PCR products it can can be monitored in real time and  used to quantify the amount of template initially present in the buffer     The technology is also used to detect genetic variation such as SNP   s  By designing a TaqMan  probe which will specifically bind to one of two or more genetic variants it is possible to detect  genetic variants by the presence or absence of fluorescence in the reaction     A specific requirement of TaqMan probes is that a G nucleotide can not be present at the 5    end  since this will quench the fluorescence of th
533. the data     To start the assembly     select sequences to assemble   Toolbox in the Menu Bar   Sequencing Data  Analyses  A    Assemble Sequences to Reference  F7     This opens a dialog where you can alter your choice of Sequences which you want to assemble   You can also add sequence lists     Note  You can assemble a maximum of 2000 sequences at a time     To assemble more sequences  you need the CLC Genomics Workbench  see http   www   clcbio com genomics      When the sequences are selected  click Next  and you will see the dialog shown in figure 17 17          q Assemble Sequences to Reference Es    p  1  Select some nucleotide   set rererence parameters  sequences    2  Set reference parameters  Reference sequence  Choose reference sequence   XX ATP8al mRNA  reference  o    Reference sequence     Include reference sequence in contig s     Only include part of the reference sequence in the contig    Do not include reference sequence in contig s   Panflirt ecnlved               A        Previous      gt  Next   Finish   XX Cancel                        Figure 1  17  Setting assembly parameters when assembling to a reference sequence     This dialog gives you the following options for assembling     e Reference sequence  Click the Browse and select element icon  uy  in order to select a  sequence to use as reference     e Include reference sequence in contig s   This will display a contig data object with the  reference sequence at the top and the reads aligned below  Th
534. the overview table  the following information is shown     e Query  Since this table displays information about several query sequences  the first column  is the name of the query sequence     e Number of hits  The number of hits for this query sequence     e For the following list  the value of the best hit is displayed together with accession number  and description of this hit         Lowest E value      Greatest identity      CHAPTER 12  BLAST SEARCH 182        Greatest positive        Greatest hit length      Greatest bit score    If you wish to save some of the BLAST results as individual elements in the Navigation Area   open them and click Save As in the File menu     12 2 3 BLAST graphics    The BLAST editor shows the sequences hits which were found in the BLAST search  The hit  sequences are represented by colored horizontal lines  and when hovering the mouse pointer  over a BLAST hit sequence  a tooltip appears  listing the characteristics of the sequence  As  default  the query sequence is fitted to the window width  but it is possible to zoom in the  windows and see the actual sequence alignments returned from the BLAST server     There are several settings available in the BLAST Graphics view     e BLAST Layout  You can choose to Gather sequences at top  Enabling this option affects  the view that is shown when scrolling horizontally along a BLAST result  If selected  the  sequence hits which did not contribute to the visible part of the BLAST graphics will be  
535. tion numbering in the  status bar   Such reads are not considered perfectly aligned reads because they don   t  align in their entire length     Include reads with less than perfect alignment Reads with mismatches  insertions or dele   tions  or with unaligned nucleotides at the ends  the faded part of a read      Note that only reads that are completely covered by the selection will be part of the new contig     One of the benefits of this is that you can actually use this tool to extract subset of reads from a  contig  An example work flow could look like this     CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 303    1  Select the whole reference sequence  2  Right click and Extract from Selection  3  Choose to include only paired matches    4  Extract the reads from the new file  see section 10 7 3     You will now have all paired reads from the original mapping in a list     17 7 7 Variance table    In addition to the standard graphical display of a contig as described above  you can also see a    tabular overview of the conflicts between the reads by clicking the Table  8  icon at the bottom  of the view        This will display a new view of the conflicts as shown in figure 17 24                    Conflict Possible SNP  Resolved conflict       Consensus TTATCTCC TGAGGAGGTCAGITGAAACACAGGGAACTGAGAG       Apm    ii      w   4       Position   Consensus residue   Other residues IUPAC Status Notes    Conflict  C  1   TZ   Cili Tic h   Conflict resolution  vote  Conflict 
536. tion of the newly sequenced and maybe unknown sequence  If  the researchers have no prior information of the sequence and biological content  valuable  information can often be obtained using BLAST  The BLAST algorithm will search for homologous  sequences in predefined and annotated databases of the users choice     In an easy and fast way the researcher can gain knowledge of gene or protein function and find  evolutionary relations between the newly sequenced DNA and well established data     After the BLAST search the user will receive a report specifying found homologous sequences  and their local alignments to the query sequence     12 5 3 How does BLAST work     BLAST identifies homologous sequences using a heuristic method which initially finds short  matches between two sequences  thus  the method does not take the entire sequence space  into account  After initial match  BLAST attempts to start local alignments from these initial  matches  This also means that BLAST does not guarantee the optimal alignment  thus some  sequence hits may be missed  In order to find optimal alignments  the Smith Waterman algorithm  Should be used  see below   In the following  the BLAST algorithm is described in more detail     Seeding    When finding a match between a query sequence and a hit sequence  the starting point is the  words that the two sequences have in common  A word is simply defined as a number of letters     CHAPTER 12  BLAST SEARCH 191    For blastp the default word si
537. to produce the best available alignment  BLAST is a heuristic method which  does not guarantee the best results  and therefore you cannot rely on BLAST if you wish to find  all the hits in the database     Instead  use the Smith Waterman algorithm for obtaining the best possible local alignments  Smith  and Waterman  1981      BLAST only makes local alignments  This means that a great but short hit in another sequence  may not at all be related to the query sequence even though the sequences align well in a small  region  It may be a domain or similar     It is always a good idea to be cautious of the material in the database  For instance  the  sequences may be wrongly annotated  hypothetical proteins are often simple translations of a  found ORF on a sequenced nucleotide sequence and may not represent a true protein     Don   t expect to see the best result using the default settings  As described above  the settings  should be adjusted according to the what kind of query sequence is used  and what kind of  results you want  It is a good idea to perform the same BLAST search with different settings to  get an idea of how they work  There is not a final answer on how to adjust the settings for your  particular sequence     12 5 9 Other useful resources    The BLAST web page hosted at NCBI  http   www ncbi nlm nih gov BLAST    Download pages for the BLAST programs  http   www ncbi nlm nih gov BLAST download  shtml    Download pages for pre formatted BLAST databases  tept 7 te
538. to specify how the Labels shown be shown     e No labels  This will just display the cut site with no information about the name of the  enzyme  Placing the mouse button on the cut site will reveal this information as a tool tip     e Flag  This will place a flag just above the sequence with the enzyme name  see an example  in figure 18 28   Note that this option will make it hard to see when several cut sites are  located close to each other  In the circular view  this option is replaced by the Radial option     e Radial  This option is only available in the circular view  It will place the restriction site  labels as close to the cut site as possible  See an example in figure 18 30      e Stacked  This is similar to the flag option for linear sequence views  but it will stack the  labels so that all enzymes are shown  For circular views  it will align all the labels on each  side of the circle  This can be useful for clearly seeing the order of the cut sites when they  are located closely together  See an example in figure 18 29      ae BGH pA  Hindili P si  xhoil    r y vw w    Figure 18 28  Restriction site labels shown as flags           Note that in a circular view  the Stacked and Radial options also affect the layout of annotations     CHAPTER 18  CLONING AND CUTTING 329                all          Figure 18 30  Restriction site labels in radial layout     Sort enzymes    Just above the list of enzymes there are three buttons to be used for sorting the list  see  figu
539. tral part of the dialog contains parameters to define the specificity of TaqMan probes   Two parameters can be set     e Minimum number of mismatches   the minimum total number of mismatches that must    CHAPTER 16  PRIMERS 269    exist between a specific TaqMan probe and all sequences which belong to the group not  recognized by the probe     Minimum number of mismatches in central part   the minimum number of mismatches  in the central part of the oligo that must exist between a specific TaqMan probe and all  sequences which belong to the group not recognized by the probe     The lower part of the dialog contains parameters pertaining to primer pairs and the comparison  between the outer oligos primers  and the inner oligos  TaqMan probes   Here  five options can  be set     Maximum percentage point difference in G C content  described above under Standard  PCR      Maximal difference in melting temperature of primers in a pair   the number of degrees  Celsius that primers in the primer pair are all allowed to differ     Maximum pair annealing score   the maximum number of hydrogen bonds allowed between  the forward and the reverse primer in an oligo pair  This criteria is applied to all possible  combinations of primers and probes     Minimum difference in the melting temperature of primer  outer  and TaqMan probe  inner   oligos   all comparisons between the melting temperature of primers and probes must be  at least this different  otherwise the solution set is excluded 
540. tructing a multiple alignment is much harder     The first major challenge in the multiple alignment procedure is how to rank different alignments  i e  which scoring function to use  Since the sequences have a shared history they are correlated  through their phylogeny and the scoring function should ideally take this into account  Doing so  is  however  not straightforward as it increases the number of model parameters considerably     CHAPTER 19  SEQUENCE ALIGNMENT 365    20 40 ri 80       kvlgafsdglah     l    Q6WN27 muhltgeekaBvtalwokvnva ENGUEENGENNANGASARGANNANEcREGANAs 5 E ATT  Q6WN20 muhltosekaavtalwokvnvxevagealoriEssaivvvopwtarffesfadisspdavmsnxkvkahgkkvlgafsdalah  Q6WN29 myhltodekaguta HER HEHE H HHHH HHHH HH t HHAH HHHH H HH afisdglah   Q6WN25 myhltgeekaBvtalwokvnvdevogealgri sasivvypwtartfosfadistodavmsnpkvkahgkkvigafsdglah   Q6WN22 muhltoceksavitimokynvdevogealori Esstvvypwtartfesfodisspdavmonpkvkahokkvlgafsdalah  p68225 muhlipegknavit EHEHE EEE TREE H h   P68053  yhltgegkaavtalwokvunvdevagealori EsSivvypwtarffosfodisspdavmonpkvkahokkvInsfseglkn  P68046 EWANiba DO RO EO    ee O RO ee eee    sfsdgik n   P68231 muhISgdeknavhalwskvkva EHHH HHHH ETT HM  P68228 mynisgdeknavholwskvkvdevagealori EsSivvypwtrrffesfodistadavmnnpkvkahoskvInsfgdglsh  NP 058652 MWA PtdaeksavSclWakWnpdevGgealarlESSSiVVVOWtaryhoSfodissasaimonPKVKARGKKVTt afnegikn   NP_032246 HEBEL BEET EERE EEE HES eH h   Q6H1U7 MV  APRaeekKnsitsiW gkWaiegtggealgrlPSlSlivyoWwtsrffohfagdisnakavmsnpkWPahgakviva
541. ts  see section 10 7   existing align   ments and from any combination of the three     To create an alignment in CLC DNA Workbench     select sequences to align   Toolbox in the Menu Bar   Alignments and Trees        Create Alignment         or select sequences to align   right click any selected sequence   Toolbox   Alignments  and Trees        Create Alignment  Ez    This opens the dialog shown in figure 19 1              a  EB Create Alignment  88   1  Select sequences of same   Seectsequencesofsamebype SS s  C   Cs SSSCSCSCSSS  type Projects  Selected Elements   6       CLC_Data   Ae 094296      Example Data   Ae  P39524   XX ATP8al genomi   As  P57792   XX ATPB8al mRNA   Ms Q29449   fht ATP8al   As  QONTI2      Cloning fie   Q95X33       5  Primers      Protein analyse      Protein ortholog    FEE ATPBal orth             gt   Le      222222       RNA secondary     Sequencing dat           4 ww b          Qy   lt enter search term gt                      Figure 19 1  Creating an alignment     If you have selected some elements before choosing the Toolbox action  they are now listed  in the Selected Elements window of the dialog  Use the arrows to add or remove sequences   sequence lists or alignments from the selected elements  Click Next to adjust alignment  algorithm parameters  Clicking Next opens the dialog shown in figure 19 2        a  BB Create Alignment Eg    1  Select sequences of same    Seb parameter  type  2  Set parameters       Gap settings  Gap open cost  
542. ts can be visualized in a dot plot as seen in figure 13 10  In this figure  three frame  shifts for the sequence on the y axis are found    1  Deletion of nucleotides    2  Insertion of nucleotides    3  Mutation  out of frame     Sequence inversions    CHAPTER 13  GENERAL SEQUENCE ANALYSES 207    Ly    Figure 13 10  This dot plot show various frame shifts in the sequence  See text for details     In dot plots you can see an inversion of Sequence as contrary diagonal to the diagonal showing  similarity  In figure 13 11 you can see a dot plot  window length is 3  with an inversion     Low complexity regions    Low complexity regions in sequences can be found as regions around the diagonal all obtaining  a high score  Low complexity regions are calculated from the redundancy of amino acids within a  limited region  Wootton and Federhen  1993   These are most often seen as short regions of only  a few different amino acids  In the middle of figure 13 12 is a square shows the low complexity  region of this sequence     Creative Commons License    All CLC bio   s scientific articles are licensed under a Creative Commons Attribution NonCommercial   NoDerivs 2 5 License  You are free to copy  distribute  display  and use the work for educational  purposes  under the following conditions  You must attribute the work in its original form and   CLC bio  has to be clearly labeled as author and provider of the work  You may not use this  work for commercial purposes  You may not alter  t
543. tsa EER EES ee Ee 256  16 4 2 Saving PCR fragments       a aoao a e a a 256  16 4 3 Adding primer binding annotation       s soso oao oaoa oa a a a a a 256  165 5 Stana POR cece we ikae eae reati dakna aa 256  16 5 1 LM  lt   shes CARRE SEG we Ree SAS Ee ES aa 257  16 5 2 Standard PCR output table              0    a 259  16 6 Nested PCR    cece Rew e eRe RH ER EE ee ewe E E 260  16 6 1 Nested PCR output table               0 000 ee eee a 262  of TANDO 2844664 Ree eaeu dee AEDI eee 262  16 7 1 TaqMan output table   senai E DS ww 263  16 8 Sequencing primers       1  22 ee ee nunn 264  16 8 1 Sequencing primers output table             000  eeu nee 265  16 9 Alignment based primer and probe design         0 008 0 ee wena 265  16 9 1 Specific options for alignment based primer and probe design        266  16 9 2 Alignment based design of PCR primers                204  266  16 9 3 Alignment based TaqMan probe design                0288  268  16 10 Analyze primer properties      1    eee 269  16 11 Find binding sites and create fragments        2    eee ee ee 271  OA LA Binding parameters  lt    s  ri eseas Ee eee aadd ER EE 211  16 11 2 Results   binding sites and fragments                 24 4  212  16 12 Order primers     1    eee ee 275    248    CHAPTER 16  PRIMERS 249    CLC DNA Workbench offers graphically and algorithmically advanced design of primers and probes  for various purposes  This chapter begins with a brief introduction to the general concepts of the  primer de
544. tween standard  and topology layout  The topology layout can help to give an overview of the tree if some of the  branches are very short     When the sequences include the appropriate annotation  it is possible to choose between the  accession number and the species names at the leaves of the tree  Sequences downloaded  from GenBank  for example  have this information  The Labels preferences allows these different  node annotations as well as different annotation on the branches     The branch annotation includes the bootstrap value  if this was selected when the tree was  calculated  It is also possible to annotate the branches with their lengths     CHAPTER 2  TUTORIALS 2    2 12 Tutorial  Find restriction sites  This tutorial will show you how to find restriction sites and annotate them on a sequence     There are two ways of finding and showing restriction sites  In many cases  the dynamic restriction  sites found in the Side Panel of sequence views will be useful  since it is a quick and easy way  of showing restriction sites  In the Toolbox you will find the other way of doing restriction site  analyses  This way provides more control of the analysis and gives you more output options  e g   a table of restriction sites and a list of restriction enzymes that can be saved for later use  In this  tutorial  the first section describes how to use the Side Panel to show restriction sites  whereas  the second section describes the restriction map analysis performed from the T
545. two options     e Save the Cloning Experiment  This is saved as a sequence list  including the specified cut  sites  This is useful if you need to perform the same process again or double check details     CHAPTER 2  TUTORIALS of    e Save the construct shown in the circular view  This will only save the information on the  particular sequence including details about how it was created  this can be shown in the  History view      You can  of course  save both  In that case  the history of the construct will point to the sequence  list in its own history     The construct is shown in figure 2 33        pcDNA4 TO Fragment  ATP8a1 mRNA  ATO  8 460bp            6000    pUC ori    Sall       TATA box  CMV forward prime  CMV    Figure 2 33  The Atp8al gene inserted after the CMV promoter    2 1 Tutorial  Primer design    In this tutorial  you will see how to use the CLC DNA Workbench to find primers for PCR  amplification of a specific region     We use the pcDNA3 atp8a1 sequence from the    Primers    folder in the Example data  This  sequence is the pcDNAS vector with the atp8a1 gene inserted  In this tutorial  we wish design  primers that would allow us to generate a PCR product covering the insertion point of the gene   This would let us use PCR to check that the gene is inserted where we think it is     First  open the sequence in the Primer Designer   Select the pcDNA3 atp8a1 sequence   Show  45     Primer Designer        Now the sequence is opened and we are ready to begin d
546. uct will also be added to the input sequence list and the original fragment    and vector sequences will be deleted     When you click Finish the final construct will be shown  see figure 18 7      You can now Save  E  this sequence for later use  The cloning experiment used to design the  construct can be saved as well  If you check the History  Ci  of the construct  you can see the  details about restriction sites and fragments used for the cloning     CHAPTER 18  CLONING AND CUTTING 313        pcDNA4 TO Fragment  ATP8a1 mRNA  AU    pUC ori    sall  Bgill    Pnn I I  bla       TATA box  CMV forward prime  CMV    Figure 18 7  The final construct     18 1 3 Manual cloning    If you wish to use the manual way of cloning  as opposed to using the cloning work flow explained  above in section 18 1 2   you can disregard the panel at the bottom  The manual cloning approach  is based on a number of ways that you can manipulate the sequences  All manipulations of  sequences are done manually  giving you full control over how the final construct is made   Manipulations are performed through right click menus which have three different appearances  depending on where you click  as visualized in figure 18 8     Sequence label Selection             Hindili  1 i  fo  pBR322  TTTGACAGCTTATCATCGATHEENEERAC SESS ASI JTCACAGTTAAATTGC  TCATCGATAAGC   GTAGTTTATCACAGTT    sequence Details       AGTAGCTA    TT CGAP    Figure 18 8  The red circles mark the two places you can use for manipulating t
547. uence length From  1 to  500 nucleotides    4 Barcode    Barcodes  length  6  Define barcodes in next step     Linker  Linker length  4 nucleotides     3  Define tags    3      Choose where to run  Option    Select nucleotide  sequences    Search both strands      Define tags  Barcodes      Set barcode options  Barcode 2   OF reads in input    43     TOT Toa Ti Sample 2    Some    Figure 17 9  Specifying the barcodes as shown in the example of figure 17 6        Click Next to specify the output options  First  you can choose to create a list of the reads that  could not be grouped  Second  you can create a Summary report showing how many reads were  found for each barcode  see figure 17 10      There is also an option to create subfolders for each sequence list  This can be handy when the  results need to be processed in batch mode  see section 9 1      A new sequence list will be generated for each barcode containing all the Sequences where  this barcode is identified  Both the linker and barcode sequences are removed from each of    CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 286    1 Multiplexig summary    1 1 Reads per barcode    Number of reads Percentage of reads    1 2 Reads per barcode       Barcodes    Reads       Barcode    Figure 17 10  An example of a report showing the number of reads in each group     the sequences in the list  so that only the target sequence remains  This means that you can  continue the analysis by doing trimming or mapping  Note that y
548. uence selection in the cloning view     CHAPTER 18  CLONING AND CUTTING 316    e Replace Selection with sequence  This will replace the selected region with a sequence   The sequence to be inserted can be selected from a list containing all Sequences in the  cloning editor     e Insert Sequence before Selection       Insert a sequence before the selected region  The  sequence to be inserted can be selected from a list containing all sequences in the cloning  editor     e Insert Sequence after Selection  i   Insert a sequence after the selected region  The  sequence to be inserted can be selected from a list containing all sequences in the cloning  editor     e Cut Sequence before Selection  Xi   This will cleave the sequence before the selection  and will result in two smaller fragments     e Cut Sequence after Selection   x   This will cleave the sequence after the selection and  will result in two smaller fragments     e Make Positive Strand Single Stranded  r    This will make the positive strand of the  selected region single stranded     e Make Negative Strand Single Stranded  tut   This will make the negative strand of the  selected region single stranded     e Make Double Stranded  mz   This will make the selected region double stranded     e Move Starting Point to Selection Start  This is only active for circular sequences  It will  move the starting point of the sequence to the beginning of the selection     e Copy Selection  5i   This will copy the selected region t
549. uences  It is evident that free end gaps are ideal in this situation as the start codons are aligned  correctly in the top alignment  Treating end gaps as any other gaps in the case of aligning distant  homologs where one sequence is partial leads to a spreading out of the short sequence as in the  bottom alignment     Both algorithms use progressive alignment  The faster algorithm builds the initial tree by doing  more approximate pairwise alignments than the slower option     19 1 3 Aligning alignments    If you have selected an existing alignment in the first step  19 1   you have to decide how this  alignment should be treated     e Redo alignment  The original alignment will be realigned if this checkbox is checked   Otherwise  the original alignment is kept in its original form except for possible extra equally  sized gaps in all sequences of the original alignment  This is visualized in figure 19 5     P68873  QEWN20  PF68231  Q6H1U7  P68945    Consensus  sequence Logo    Conservation    CHAPTER 19  SEQUENCE ALIGNMENT    MVHLTPEEKS  MVHLTGEEKA  MVHLSGDEKN  MVHLTAEEKN    VHWTAEEKQ    MVHLTAEEKN    MVHCTSEEKs    20       AVTALWGKVN  AVTALWGKVN  AVHGLWSKVK  AITSLWGKVA  LI TGLWGKVN    AVTALWGKVN    Av TaLWeKVa    VDEVGGEALG  VXEVGGEALG  VDEVGGEALG  IEQTGGEALG  VADCGAEALA    VDEVGGEALG  VsevGGEAL      RLLVVYPW  RLLVVYPW  RLLVVYPW  RLLIVYPW  RLLIVYPW    RLLVVYPW    RLLYVY PW       20    Pese73 MVHLTPEEKS AVTALWGKV     NVDEVGG EALGRLLYV  Q6WN20 MVHLTGEEKA AVTALWGKV     NVXEVG
550. use mouse  Mus musculus  To obtain more information  about this molecule you wish to query the peptides held in the Swiss Prot  database to find  homologous proteins in humans Homo sapiens  using the Basic Local Alignment Search Tool     BLAST  algorithm     This tutorial involves running BLAST remotely using databases housed at the NCBI  Your computer  must be connected to the internet to complete this tutorial     2 8 1 Performing the BLAST search  Start out by   select protein ATP8a1   Toolbox   BLAST Search       NCBI BLAST   2     In Step 1 you can choose which sequence to use as query sequence  Since you have already  chosen the sequence it is displayed in the Selected Elements list     CHAPTER 2  TUTORIALS 62    Click Next     In Step 2  figure 2 42   choose the default BLAST program  blastp  Protein sequence and  database and select the Swiss Prot database in the Database drop down menu                    E  E nce BLAST  88   1  Select sequences of same    Setprogramps amece  2  Set program parameters  Choose program and database  Program  blastp  Protein sequence and database X  Database  Swiss Prot protein sequences  swissprot  X  Genetic code  1 Standard  Database genetic code    1 Standard  etn  Sal   56 Preview   out   En                Figure 2 42  Choosing BLAST program and database     Click Next     In the Limit by Entrez query in Step 3  choose Homo sapiens ORGN  from the drop down menu  to arrive at the search configuration seen in figure 2 43  Including th
551. ut many may just matching by  chance  not due to any biological similarity  Values of E less than one can be entered as  decimals  or in scientific notiation  For example  0 001  1e 3 and 10e 4 would be equivalent  and acceptable values     e Word Size  BLAST is a heuristic that works by finding word matches between the query  and database sequences  You may think of this process as finding  hot spots  that BLAST  can then use to initiate extensions that might lead to full blown alignments  For nucleotide   nucleotide searches  i e   BLASTn     an exact match of the entire word is required before  an extension is initiated  so that you normally regulate the sensitivity and speed of the  search by increasing or decreasing the wordsize  For other BLAST searches non exact word  matches are taken into account based upon the similarity between words  The amount of  similarity can be varied so that you normally uses just the wordsizes 2 and 3 for these  searches     e Matrix  A key element in evaluating the quality of a pairwise sequence alignment is the   substitution matrix   which assigns a score for aligning any possible pair of residues  The  matrix used in a BLAST search can be changed depending on the type of Sequences you  are searching with  see the BLAST Frequently Asked Questions   Only applicable for protein  sequences or translated DNA sequences     e Gap Cost  The pull down menu shows the Gap Costs  Penalty to open Gap and penalty to  extend Gap   Increasing the Gap C
552. values  in the Primer Parameters preference group  the Calculate button will activate the primer design  algorithm     After pressing the Calculate button a dialog will appear  see figure 16 10  which is similar to the  Nested PCR dialog described above  see section 16 6      go Calculation parameters    Chosen parameters  Maximum primer length  Minimum primer length  Maximum G C content  Minimum GIC content  Maximum melting temperature  Minimum melting temperature  Maximum self annealing  Maximum self end annealing  Maximum secondary structure  3 end must meet G C requirements  5 end must meet G C requirements  Primer combination parameters  Max percentage point difference in G C content  Max difference in melting temperatures within a primer pair  Max hydrogen bonds between pairs  Max hydrogen bonds between pair ends  Minimum difference in melting temperature Inner Outer    Maximum length of amplicon  Mispriming parameters  Use mispriming as exclusion criteria     Exact match  Minimum number of base pairs required for a match    Number of consecutive base pairs required in 3 end    Kea  rm     Figure 16 10  Calculation dialog       In this dialog the options to set a minimum and a desired melting temperature difference between  outer and inner refers to primer pair and probe respectively     Furthermore  the central part of the dialog contains an additional parameter    e Maximum length of amplicon   determines the maximum length of the PCR fragment  generated in the TaqMan
553. very  similar on these operating systems  In areas where differences exist  these will be described  separately  However  the term  right click  is used throughout the manual  but some Mac users  may have to use Ctrl click in order to perform a  right click   if they have a single button mouse      The most recent version of the user manuals can be downloaded from http   www clcbio   com usermanuals     The user manual consists of four parts   e The first part includes the introduction and some tutorials showing how to apply the most  significant functionalities of CLC DNA Workbench   e The second part describes in detail how to operate all the program   s basic functionalities     e The third part digs deeper into some of the bioinformatic features of the program  In this  part  you will also find our  Bioinformatics explained  sections  These sections elaborate on  the algorithms and analyses of CLC DNA Workbench and provide more general knowledge  of bioinformatic concepts     e The fourth part is the Appendix and Index     Each chapter includes a short table of contents     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 35    1 9 1 Text formats    In order to produce a clearly laid out content in this manual  different formats are applied     e A feature in the program is in bold starting with capital letters    Example  Navigation Area     e An explanation of how a particular function is activated  is illustrated by     and bold   E g    select the element   Edit   Rename 
554. w     To delete a motif  select it and press the Delete key on the keyboard  Alternatively  click Delete   4 1  in the Tool bar     Save the motif list in the Navigation Area  and you will be able to use for Motif Search  40   see  section 13 7      Chapter 14    Nucleotide analyses    Contents   14 1 Convert DNA to RNA       2 2 ee ee unnan nnen 229  14 2 Convert RNA to DNA       2 2 ee eee nnne 230  14 3 Reverse complements of Sequences         00888 ee ee een nn ee 231  14 4 Reverse sequence       2  0 ee ee 232  14 5 Translation of DNA or RNA to protein        0 0 88 eee een eee 232   14 5 1 Translate part of a nucleotide sequence            a 2 eae 234  14 6 Find open reading frames        2 0 2 eee ee 234   14 6 1 Open reading frame parameters         oa a a ee ee ee 234    CLC DNA Workbench offers different kinds of sequence analyses  which only apply to DNA and  RNA     14 1 Convert DNA to RNA    CLC DNA Workbench lets you convert a DNA sequence into RNA  substituting the T residues   Thymine  for U residues  Urasil      select a DNA sequence in the Navigation Area   Toolbox in the Menu Bar   Nucleotide  Analyses          Convert DNA to RNA          or right click a sequence in Navigation Area   Toolbox   Nucleotide Analyses  ZA     Convert DNA to RNA  kg     This opens the dialog displayed in figure 14 1     lf a sequence was selected before choosing the Toolbox action  this sequence is now listed in  the Selected Elements window of the dialog  Use the arrows to add 
555. w ee wee we eee ee ee ee 218  13 6 Pattern Discovery        2  2 ee eee ee 219  13 6 1 Pattern discovery search parameters          2 0 0 0 ee ee ees 220  13 6 2 Pattern search output   nwo  amp  eG ee Ee we Ge ew ee EE ee oS ee 221  13 7 Motif Search  is ok Se ek ee ee ee ee ee ee ee ee ee a we ee 221  LS DOMO MOS ic we ee ee A a deh DE E TE a 222  13 7 2 Motif search from the Toolbox   2 4  4 Ss Sw ew dA ER E AA 224  13 7 3 Java regular expressions   a ase ac dew ew ds Ea a ad 225  13 7 4 Create motif list accra ew we oe a ee ww ee A 22     CLC DNA Workbench offers different kinds of sequence analyses  which apply to both protein and  DNA  The analyses are described in this chapter     13 1 Shuffle sequence    In some cases  it is beneficial to shuffle a sequence  This is an option in the Toolbox menu under  General Sequence Analyses  It is normally used for statistical analyses  e g  when comparing an  alignment score with the distribution of scores of shuffled sequences     Shuffling a sequence removes all annotations that relate to the residues   select sequence   Toolbox in the Menu Bar   General Sequence Analyses  nk       Shuffle Sequence  2     199    CHAPTER 13  GENERAL SEQUENCE ANALYSES 200    or right click a sequence   Toolbox   General Sequence Analyses  A    Shuffle  Sequence    x     This opens the dialog displayed in figure 13 1                          a  BB Shuffle Sequence  ES   E Select one or more    Selled one or more sequences of same type SCS  sequenc
556. w the print dialog  which lets  you choose e g  which pages to print     The Print preview window is for preview only   the layout of the pages must be adjusted in the  Page setup     Chapter 7    Import export of data and graphics    Contents  7 1   Bioinformatic data formats       2    ee te snn snsss   117  7 1 1 Import of bioinformatic data      2s 6 sau eee Raw eee ee we we eS 118  feo import VYect  r NTI data 26 sa dau ce nesana bbw Gwe oe Sw 119  7 1 3 Export of bioinformatics data          a a 0  eee ee ee a 122  7 2 External files      1    ee a 124  7 3 Export graphics to files        0    ee ete et 124  7 3 1 Which part of the view to export       2    a a a a ee ee a 125  1 3 2 Save location and file formats           2 002 eee een nee 125  7 3 3 Graphics export parameters    no ao aoao oa a a a le 127  Es Exporting protein repots cassar ME E woe 128  7 4 Export graph data points to a file        2  ee ee nnnnnsnnnnanna 128  7 5 Copy paste view output       0 2  ee et 130    CLC DNA Workbench handles a large number of different data formats  All data stored in the  Workbench are available in the Navigation Area  The data of the Navigation Area can be divided  into two groups  The data is either one of the different bioinformatic data formats  or it can be  an    external file     Bioinformatic data formats are those formats which the program can work  with  e g  sequences  alignments and phylogenetic trees  External files are files or links which  are stored in CL
557. we ee eo 304  17 9 Secondary peak calling   aw ee he eked eae ne dat bu Sane eee td ae i 305  18 Cloning and cutting 307  Poet   Ole CUE Che s e sane ee ee ee oe eee ee Eee oe 308  Toe Wey CIE ne ee Ree ee eee a a TEA E 318  18 3 Restriction site analysis      s ao se rar Ed be ek Ge  Ww Sw a we be ook S 321  18 4 Gel electrophoresis   skew eee ceeucae Rw    Gee Hee RD ROBO E 6 340  18 5 Restriction enzyme listS    lt eccacideesewedie ak idda RES ER E 343  19 Sequence alignment 347  19 1 Create an alignment     ick aneaaee eke bORER SH ELE EAD ERED EO 348  19 2 View alignments     nonoo ee 353  19 3 EditalgnmeniS os ia ee eeu ee oe ia nania ow ba ALR RE AA Sof  19 4 MM GC o ise ereda meee he eo eee eh ee Eo we ee Rd SS 359  13 5 Pairwise COM AMON   sian 2 cneece kit neii nuntik LOE RS HO 361  19 6 Bioinformatics explained  Multiple alignments             2 2 08 8 eae 364  20 Phylogenetic trees 366  20 1 Inferring phylogenetic trees     1    2  a 366  20 2 Bioinformatics explained  phylogenetics             00 00  ee euee 311  IV Appendix 315  A Comparison of workbenches and the viewer 376  B Graph preferences 381    C Working with tables 383    CONTENTS    fas PRM DOS 2 a se ee ee ERR ee A ee eS    D BLAST databases  D 1 Peptide sequence databases           00 eee ee ee ee ee ee  D 2 Nucleotide sequence databases          0 000  eee ee a    D 3 Adding more databases          00 0 a ee ee ee    E Restriction enzymes database configuration    F Technical information about modif
558. ween the cut sites selected     If the entire sequence should be selected as fragment  click the Add Current Sequence as  Fragment  7      At any time  the selection of cut sites can be cleared by clicking the Remove  4  icon to the right  of the fragment selections  If you just wish to remove the selection of one of the sites  right click  the site on the sequence and choose De select This     Site     Defining target vector    When selecting among the sequences in the panel at the top  the vector sequence has  vector   appended to its name  If you wish to use one of the other sequences as vector  select this  sequence in the list and click Change to Current     The next step is to define where the vector should be cut  If the vector sequence should just be  opened  click the restriction site you want to use for opening  If you want to cut off part of the  vector  click two restriction sites while pressing the Ctrl key     on Mac   You can also right click  the cut sites and use the Select This     Site to select a site     This will display two options for what the target vector should be  for linear vectors there would  have been three option  as shown in figure 18 5     Just as when cutting out the fragment  there is a lost of choices regarding which sequence should  be used as the vector     At any time  the selection of cut sites can be cleared by clicking the Remove  4  icon to the  right of the target vector selections  If you just wish to remove the selection of one 
559. will be able  to click Next     Go to license download web page    Selecting the second option  Go to license download web page  opens the license web page as  shown in 1 4     Click the Request Evaluation License button  and you will be able to save the license on your  computer  e g  on the Desktop     CHAPTER 1  INTRODUCTION TO CLC DNA WORKBENCH 18    Request an Evaluation License    icense  please dick the button below     Request Evaluation License  ul a file containing the license will b d to    To begin ur license you  Choose License File button and locate the file on your c       Figure 1 4  The license web page where you can download a license     Back in the Workbench window  you will now see the dialog shown in 1 5        License Wizard    d CLC DNA Workbench       Import a license from a file       Please click the button below and locate the file containing your license     No file selected          Choose License File         If you experience any problems  please contact The CLC Support Team            Proxy Settings     Previous   Next   Quit Workbench         Figure 1 5  Importing the license downloaded from the web site     Click the Choose License File button and browse to find the license file you saved before  e g   on your Desktop   When you have selected the file  click Next     Accepting the license agreement    Regardless of which option you chose above  you will now see the dialog shown in figure 1 6     License Wizard       d CLC DNA Workbench    Li
560. wn in figure 18 45  See section 10 3 for more information about viewing    Sacil  T Ei       ATPsal MRNA GGTGGGAGGCGCGGCCCCGCGGCAGCTGAGCCC    Figure 18 45  The result of the restriction analysis shown as annotations     annotations     Table of restriction sites  The restriction map can be shown as a table of restriction sites  see figure 18 46      Each row in the table represents a restriction enzyme  The following information is available for  each enzyme     CHAPTER 18  CLONING AND CUTTING 339    Restriction m    3     Rows  5 Restriction sites table Fiter  O    Segu    Mame Pattern Cyverhang Number    Cut position s   PERHSBC CjePI ccannnnnnntc  5  151  184     PERH o no Sh       PERHGEC No gaga o o A  PERHOEC  Tso rca Po o qa  PERHGEC Jrhili arca Bo o foi  TT  T  EEE DDD      Figure 18 46  The result of the restriction analysis shown as annotations        Sequence  The name of the sequence which is relevant if you have performed restriction  map analysis on more than one sequence     Name  The name of the enzyme     Pattern  The recognition sequence of the enzyme     Overhang  The overhang produced by cutting with the enzyme  3     5    or Blunt      e Number of cut sites     Cut position s   The position of each cut           Ifthe enzyme cuts more than once  the positions are separated by commas            If the enzyme   s recognition sequence is on the negative strand  the cut position is  put in brackets  as the enzyme Tsol in figure 18 46 whose cut position is  13
561. wn nucleotide  N     Ambiguity nucleotides  R  Y  etc          Create Full contigs  including trace data    E Show tabular view of contigs  Create only consensus sequences                A        Previous      gt  Next   Finish   XX Cancel               Figure 17 16  Setting assembly parameters     This dialog gives you the following options for assembling     e Trim sequence ends before assembly  If you have not previously trimmed the sequences   this can be done by checking this box  If selected  the next step in the dialog will allow you  to specify settings for trimming  see section 17 3 2      e Minimum aligned read length  The minimum number of nucleotides in a read which must    be successfully aligned to the contig   excluded from the assembly     If this criteria is not met by a read  the read is    e Alignment stringency  Specifies the stringency of the scoring function used by the alignment  step in the contig assembly algorithm  A higher stringency level will tend to produce contigs    CHAPTER 17  SEQUENCING DATA ANALYSES AND ASSEMBLY 292    with less ambiguities but will also tend to omit more sequencing reads and to generate  more and shorter contigs  Three stringency levels can be set         Low       Medium       High     e Conflicts  If there is a conflict  i e  a position where there is disagreement about the  residue  A  C  T or G   you can specify how the contig sequence should reflect the conflict         Vote  A  C  G  T   The conflict will be solved by
562. wrote a paper reviewing the BLOSUM62 substitution matrix and how to  calculate the scores  Eddy  2004      Use of scoring matrices    Deciding which scoring matrix you should use in order of obtain the best alignment results is a  difficult task  If you have no prior knowledge on the sequence the BLOSUM62 is probably the  best choice  This matrix has become the de facto standard for scoring matrices and is also used  as the default matrix in BLAST searches  The selection of a  wrong  scoring matrix will most  probable strongly influence on the outcome of the analysis  In general a few rules apply to the  selection of scoring matrices     e For closely related sequences choose BLOSUM matrices created for highly similar align   ments  like BLOSUMSO  You can also select low PAM matrices such as PAM1     e For distant related sequences  select low BLOSUM matrices  for example BLOSUM45  or  high PAM matrices such as PAM250     The BLOSUM matrices with low numbers correspond to PAM matrices with high numbers   See  figure 13 13  for correlations between the PAM and BLOSUM matrices  To summarize  if you  want to find distant related proteins to a sequence of interest using BLAST  you could benefit of  using BLOSUM4D5 or similar matrices     Other useful resources    CHAPTER 13  GENERAL SEQUENCE ANALYSES 211    PAM 1 PAM 120 PAM250  BLOSUM80 BLOSUM62 BLOSUM45    dr    Less divergent More divergent       Figure 13 13  Relationship between scoring matrices  The BLOSUM62 has become a de
563. x        Table  The translation table to use in the translation  For more about translation  tables  see section 14 5         Only AUG start codons  For most genetic codes  a number of codons can be start  codons  Selecting this option only colors the AUG codons green         Single letter codes  Choose to represent the amino acids with a single letter instead  of three letters     e Trace data  See section 17 1     e Quality scores  For sequencing data containing quality scores  the quality score information  can be displayed along the sequence         Show as probabilities  Converts quality scores to error probabilities on a O 1 scale   i e  not log transformed         Foreground color  Colors the letter using a gradient  where the left side color is used  for low quality and the right side color is used for high quality  The sliders just above  the gradient color box can be dragged to highlight relevant levels  The colors can be  changed by clicking the box  This will show a list of gradients to choose from         Background color  Sets a background color of the residues using a gradient in the  same way as described above         Graph  The quality score is displayed on a graph  Learn how to export the data behind  the graph in section   4    x Height  Specifies the height of the graph   x Type  The graph can be displayed as Line plot  Bar plot or as a Color bar   x Color box  For Line and Bar plots  the color of the plot can be set by clicking  the color box  For Colors
564. xample data  import  30  Excel  export file format  395  Expand selection  148  Expect  BLAST search  183  Export  bioinformatic data  122  dependent objects  123  folder  122  graph in csv format  128  graphics  124  history  123  list of formats  392  multiple files  122  preferences  109  Side Panel Settings  107  tables  395  Export visible area  125  Export whole view  125  Expression analysis  3    Expression clone  creating  326  Extensions  30  External files  import and export  124  Extinction coefficient  216  Extract  part of a contig  301  Extract sequences  165    FASTA  file format  393   Feature request  28   Feature table  218   Features  see Annotations   File name  sort sequences based on  2 9  File system  local BLAST database  187    408    Filtering restriction enzymes  330  332  336   344   Find   in GenBank file  162   in sequence  14 7   results from a finished process  93  Find open reading frames  234  Fit to pages  print  115  Fit Width  92  Fixpoints  for alignments  351  Floating license  24  Floating license  use offline  25  Floating Side Panel  112  Folder  create new  tutorial  37  Follow selection  142  Footer  116  Format  of the manual  35  FormatDB  187  Fragment table  339  Fragment  select  149  Fragments  separate on gel  341  Free end gaps  349  fsa  file format  395    G C content  145  378  G C restrictions  3    end of primer  253  5    end of primer  253  End length  253  Max G C  253  Gap  compare number of  363  delete  35 7  ext
565. y of finding restriction sites                 12  2 12 2 The Toolbox way of finding restriction sites                  13    This chapter contains tutorials representing some of the features of CLC DNA Workbench  The  first tutorials are meant as a short introduction to operating the program  The last tutorials give    examples of how to use some of the main features of CLC DNA Workbench   Watch video  tutorials at http    www clcbio com tutorials     2 1 Tutorial  Getting started    This brief tutorial will take you through the most basic steps of working with CLC DNA Workbench   The tutorial introduces the user interface  shows how to create a folder  and demonstrates how  to import your own existing data into the program     When you open CLC DNA Workbench for the first time  the user interface looks like figure 2 1   At this stage  the important issues are the Navigation Area and the View Area     The Navigation Area to the left is where you keep all your data for use in the program  Most  analyses of CLC DNA Workbench require that the data is saved in the Navigation Area  There  are several ways to get data into the Navigation Area  and this tutorial describes how to import  existing data     The View Area is the main area to the right  This is where the data can be    viewed     In general   a View is a display of a piece of data  and the View Area can include several Views  The Views  are represented by tabs  and can be organized e g  by using    drag and drop        
566. ying Gateway cloning sites    G Formats for import and export  G 1 List of bioinformatic data formats       ean ck banc ed a E      G 2 List of graphics data formats      noaoo a a    H IUPAC codes for amino acids      IUPAC codes for nucleotides    J Custom codon frequency tables    Bibliography    V Index    386  386  386  387    389    390    392  392  395    396    398    399    400    404    Part      Introduction    Chapter 1    Introduction to CLC DNA Workbench    Contents  1 1 Contactinformation       0    2 eee 12  1 2 Download and installation        0  eee ee 12  1 2 1  Program download   shee ek et edhe Peake kb we aastasi 12  1 2 2 Installation on Microsoft Windows            050 28528 2 eee 12  1 2 3 Installation on Mac OSX   a0 cask hw Re E EE ew ew we 13  1 2 4   Installation on Linux with an installer              850 582506 14  1 2 5 Installation on Linux with an RPM package                4   15  1 3 System requirements       0    eee eee 15  LA LOBOS e224 cu eae wee eee eee rs CA CERCADA 15  1 4 1 Request an evaluation license           a ee ee ee ee es 16  1 4 2 Download a license      1    eee ee 19  1 4 3 Import a license from a file            0 0 00 0 ee ee ee es 21  LAA MC MC Gb ea ee he Re EEE Se MEDE  21  1 4 5 Configure license server connection            2 058 552888   24  1 4 6 Limited mode isa se cde owe a Pa ed Ge we SO A  amp  2   1 5 About CLC Workbenches          0  08 ee ee eee eee es 27  1 5 1 New program feature request         0  0  2 
567. you can see that we are close to the end of the end of   Rev3   and the quality of the chromatogram traces is often low near the ends     CHAPTER 2  TUTORIALS 48       TCCATCCGGGAAGTT  ACGGCTCTA ci Eiu  ho    FH        Assembly layout  Conflict      Gather sequences at top         Show sequence ends    TCCATCCGGGAAGTTTACGGCTCTAC       Find Conflict      Low coverage threshold 8             TCCATCCGGGAAGTT IACGGCTCTAC   Annotation types   gt  Residue coloring    KO A       amp      AL Qf att   gt  Nucleotide info   gt  Find    b Text Format          TCCATCCGGGAAGTT  ACGGCTCTACTGCAAAGGAGCTGACACAGTAA       AAA ARA AAA AAA ARIANA  Figure 2 16  Using the Find Conflict button highlights conflicts        Based on this  we decide not to trust  Rev3   To correct the read  select the  T  in the  Rev3   sequence by placing the cursor to the left of it and dragging the cursor across the T  Press Delete            This will resolve the conflict     2 5 5 Including regions that have been trimmed off  Clicking the Find Conflict button again will find the next conflict     This is the beginning of a stretch of gaps in the consensus sequence  This is because the reads  have been trimmed at this position  However  if you look at the read at the bottom  Fwd2  you  can see that a lot of the peaks actually seem to be fine  so we could just as well include this  information in the contig     If you scroll a little to the right  you can see where the trimmed region begins  To include this  region i
568. ys apply these settings       vsave    XM Cancel      Figure 5 11  The save settings dialog           The settings are specific to the type of view  Hence  when you save settings of a circular view   they will not be available if you open the sequence in a linear view     CHAPTER 5  USER PREFERENCES AND SETTINGS 112    Save Settings      k Sequence layout    Delete Settings      Annotation layout Apply Saved Settings P Compact    k Annotation types Non compact  no wrap     Ts Non cormpact with translations  k Restriction sites   Rasmal colors   k Residue coloring  Show translation      Nucleotide info CLC Standard Settings    k Find    k Text Format       Figure 5 12  Applying saved settings     If you wish to export the settings that you have saved  this can be done in the Preferences dialog  under the View tab  see section 5 2 2      The remaining icons of figure 5 10 are used to  Expand all groups  Collapse all groups  and  Dock Undock Side Panel  Dock Undock Side Panel is to make the Side Panel  floating   see  below      5 6 1 Floating Side Panel    The Side Panel of the views can be placed in the right side of a view  or it can be floating  see  figure 5 13      a    HH sequence list x     Sequence list  sequence list Number of rows  5    Accession Definition Modificati    Length  M15292 i i vo fer  APR 1993  110    f          l   Eme x       k Show column       Figure 5 13  The floating Side Panel can be moved out of the way  e g  to allow for a wider view of  a table 
569. ze is 3 W 3  If a query sequence has a QWRTG  the searched words  are QWR  WRT  RIG  See figure 12 15 for an illustration of words in a protein sequence   Query word W 3          GSVEDTTGSQSLAALLNKCKTPOGQRLVNOQWIKOPLMDKNRIEERLNLVEAFVEDAELROTLOEDL    Figure 12 15  Generation of exact BLAST words with a word size of W 3     During the initial BLAST seeding  the algorithm finds all common words between the query  sequence and the hit sequence s   Only regions with a word hit will be used to build on an  alignment     BLAST will start out by making words for the entire query sequence  see figure 12 15   For each  word in the query sequence  a compilation of neighborhood words  which exceed the threshold  of T  is also generated     A neighborhood word is a word obtaining a score of at least T when comparing  using a selected  scoring matrix  see figure 12 16   The default scoring matrix for blastp is BLOSUM62  for  explanation of scoring matrices  see www clcbio com be   The compilation of exact words    and neighborhood words is then used to match against the database sequences   Query word W 3                GSVEDTTGSQSLAALLNKCKTPOGORLVNOWIKQPLMDKNRIEERLNLVEAFVEDAELROTLQEDL  PQG 18  PQG 15    PEG 14   Neighborhood pRG 14 Scores from     Words EIS BLOSUM62 matirx  PNG 13  PDG 13  PHG 13  PMG 13  PSQ 13  PQA 12  PON 12 Threshold for  Bree es neighborhood words    T 13    Figure 12 16  Neighborhood BLAST words based on the BLOSUM62 matrix  Only words where the  threshold T exceeds
570. zip  Extract the file included in the zip archive and save it in the settings folder of the Work   bench installation folder  The file you download contains the standard configuration  You should  thus update the file to match your specific needs  See the comments in the file for more  information     The name of the properties file you download is gatewaycloning 1 properties  You  can add several files with different configurations by giving them a different number  e g   gatewaycloning 2 properties and so forth  When using the Gateway tools in the Work   bench  you will be asked which configuration you want to use  see figure F 1      390    APPENDIX F  TECHNICAL INFORMATION ABOUT MODIFYING GATEWAY CLONING SITES 391      EE  Add atte Sites       Set parameters    Standard configuration for gateway doning with up to four fragments     Problems       Em sto   Me       Figure F 1  Selecting between different gateway cloning configurations     Appendix G    Formats for import and export    G 1 List of bioinformatic data formats    Below is a list of bioinformatic data formats  i e  formats for importing and exporting sequences   alignments and trees     392    APPENDIX G  FORMATS FOR IMPORT AND EXPORT    G 1 1 Sequence data formats    File type  FASTA   AB1   ABI   CLC   Clone Manager    CSV export  CSV import    DNAstrider   DS Gene   Embl   GCG sequence  GenBank  Gene Construction Kit  Lasergene  Nexus   Phred   PIR  NBRF   Raw sequence  SCF2   SCF3   Staden    Swiss Prot  Ta
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
Broan-NuTone F404201 / F404202 / F404208 / F404211 / F404222 Kitchen Hood  BOSS*POUR-ON Insecticide Liquide  G. システム設定    [FR] Manuel d`instruction FLASHBLACK  MATERIEL NECESSAIRE : Kit GPI-S Prins 6 cylindres (ref    Accuturn Service Manual  Tripp Lite Cat5e 350MHz Bulk Stranded-Core PVC Cable - Gray, 1000-ft.  Philips BT1005    Copyright © All rights reserved. 
   Failed to retrieve file