Home
        T-Coffee User Guide and Reference Manual
         Contents
1.                                        63  MOS A er Ed du bnc ti ra oec ege Festive oe pese Grid aR sacha eu A caspian EM EN 64  DL AL TA A                                                cousepteeces 64  HOIETOL MUT LIEU RARE                64  CORE Complit  ation        5 et a di HO AH I EE oed deos 64  cevaluate Mod A UR ee ecb eb ir IR Eee UE Der gris 64  Generic Output        B  ilding a Serv Teresu rnea AA 66  Common Problems when setting up ServerS        seseseeseseeseseroesosocoesosoesesoeeesoesesessesesossesoesesoreesosoesesoreesesseseseesesossesossesee 66  Output of th    dnd file  IRR Eme ARRA A ES 66  Perinissions   io e e rei ide de o qp e RUE POE IH eges se ee tune redi evo S eo etre epa edes 66  Other PHOSPOMS  ete echo Rp EE WOO EON ED e TOUR T RH RA pea 66    OPA SR sere CERO rae tees sede QU MIEL PLA E MAD DM LAUS    Parameter MeS A              MS  Sequence Name Handling  Automatic Format Recognition                ssscssssssrsssssscersesssssssssecsscsssssessessersssssessensessssssessesssssscssaseassensesseesees  Ql       M                         MH  SI AA A MM MMC ERU EE REM  jn n                                                                                  M  LOS  a rl  T COFFEE LIB FORMAT 01  T COFFEE LIB FORMAT 02  Library D E ES A T AE OER E EEEE E E  Substitution AAA O  ClustalW Style  Deprecated             5  n e em eo e eite elena UR POR e de tuts  BLAST Format  Recommended   Sequences  Welglits         eoo ckttoi NN    Known PrODDIeNmsS z ooa 
2.                                       saat 48  sseq name  for quadruplet    o d ed de 48  A est dactubcesdatecedsshsensasedach a ap a S E a ae ea ESAS aaia iee SEE eas siie Ea oaa 48  Tree Compilation    ss as cca casas 52 TEN 49  distance  matrix mode  secet ree eR UR EG ERE e ler at ete Eakins 49  squicktree TOW  RE Bip e e RV nes 49  Pair wise Alignment Computation                                        ss 49  Zap mode  dedo er RO ES e p ea pU Ee Ed 50  O t NUR REGE YO A ots nes teen aed RU R DEN Se tate ee 50       T COFFEE REFERENCE MANUAL                                                    MUA MO a  zdiag threshold    o etit e pU OU ND C E Een  Ru m                                                            SA RE PRESSE See   nomatch   gapopen  pu 21H                                                     fgapopen  Dalm                                                              zCOSTIE TIC D  Ralty    o oett site do ettet mter ule etur  Jg modes c n CERBERUS  MAIS  A O O O  Multiple Alignment Computation                                               ss  Smsa modes  xe e qe eye ua a etes etti RE aa   profile comparison  sprofile mode  cod sao t eve dom A A Leu ir Mc A Mq UM oe Metti EUR  Alignment Post Processing   e RR ERE E EUR EN e   clean aln    clean threshold    gre T E e RUE ERR apa pte red tee pet ree  ST iteration    eset t OE E ate a ele temm tete tane ta deines   clean evaluation mode     QI DE  CPU Control    intet RON  Multithreading iu eee ep ete PE eee tubas
3.     A    identifier that tells the program to keep the sequences aligned     Aligning Nucleic Acids    Nucleic acid sequences are difficult to align and T  Coffee is not especially well  gifted  However  if you want to give it a try  you are advised to use the following       19    command line     EXCL  t coffee sample dnasegl fasta  special mode dna    This special mode triggers the use of slow pair4dna and lalign id pair4dna  that  use lower gap estension penalties and the identity matrix  If you would rather use  your own matrix  use     t coffee sample dnaseql fasta  in  Mlalign id pair4dna EP MATRIX  idmat    Where you should replace idmat with your own matrix  in BLAST format  see the  format section      Aligning Sequences and Structures    Assuming some structures are associated with your sequences  it is possible to align  these sequences while using associated structural information  The easiest way to do  this is to use 3dcoffee     Aligning Sequences and Profiles    T Coffee can make multiple profile alignments  In this context  the alignments are  treated as single sequences and aligned to one another in a progressive fashion   Currently  we only support profiles under the form of standard multiple sequence  alignments  The profile must either be entered via the    profile flag     EXCL  t coffee  profile sample alnl aln sample aln2 aln    outfile combined profiles aln    It is also possible to read the profile via the    in flag  as long as they are preceded  w
4.    Usage   cosmetic penalty    negative value gt   Default   cosmetic penalty  50    Indicates the penalty applied for opening a gap  This penalty is set to a very low value  It  will only have an influence on the portions of the alignment that are unalignable  It will not  make them more correct  but only more pleasing to the eye   i e  Avoid stretches of lonely  residues      The cosmetic penalty is automatically turned off if a substitution matrix is used rather than a  library    tg mode    Usage   tg mode  lt 0  1  or 2 gt   Default   tg mode 1    0  terminal gaps penalized with  gapopen    gapext len  1  terminal gaps penalized with a  gapext len    2  terminal gaps unpenalized        52    Weighting Schemes     seq weight  Usage   seq weight  lt t coffee or  lt file_name gt  gt   Default   seq weight t coffee    These are the individual weights assigned to each sequence  The t coffee weights try to  compensate the bias in consistency caused by redundancy in the sequences     sim A B   similarity between A and B  between 0 and 1   weight A  1 sum sim A X 13     Weights are normalized so that their sum equals the number of sequences  They are applied  onto the primary library in the following manner     res_score Ax B y  Min weight A   weight B   res score Ax  By     These are very simple weights  Their main goal is to prevent a single sequence present in  many copies to dominate the alignment     Note  The library output by  out lib is the un weighted library   Note  We
5.    Usage   type DNA    PROTEIN  DNA_PROTEIN  Default   type  lt automatically set gt     This flag sets the type of the sequences  If omitted  the type is guessed automatically  This  flag is compatible with ClustalW     Note  In case of low complexity or short sequences  it is recommended to set the type  manually    seq    Usage   seq   lt P S gt  lt name gt     Default  none     seq is now the recommended flag to provide your sequences  It behaves mostly like  the  in flag    seq source    Usage   seq_source  lt ANY or _LS or LS  gt   Default  ANY     You may not want to combine all the provided sequences into a single sequence list  You  can do by specifying that you do not want to treat all the    in files as potential sequence  sources      seq_source  LA indicates that neither sequences provided via the A  Alignment  flag or  via the L  Library flag  should be added to the sequence list      seq_source S means that only sequences provided via the S tag will be considered  All the  other sequences will be ignored     Note  This flag is mostly designed for interactions between T Coffee and T CoffeeDPA   the large scale version of T Coffee      Structure Input     pdb    Usage   pdb  lt pdbid1 gt   lt pdbid2 gt      Max 200   Default  None    Reads or fetch a pdb file  It is possible to specify a chain or even a sub chain     PDBID  PDB_CHAIN   opt   FIRST LAST   opt     Tree Input     usetree    Usage   usetree  lt tree file gt        43    Default  No file specified    
6.   I want to output an html file and a regular file    A  see the next question    Q  I would like to output more than one alignment  format at the same time    A  The flag  output accepts more than one parameter  For instance        32    EXCL  t coffee sample seql fasta    output clustalw score html score ps msf    This will output founr alignment files in the corresponding formats  Alignments   names will have the format name as an extension     Note  you need to have the converter ps2pdf installed on your system  standard under  Linux and cygwin   The latest versions of Internet Explorer and Netscape now allow  the user to print the HTML display Do not forget to request Background printing     Alignment Computation    Q  Can t  coffee align Nucleic Acids        A  yes it can  but you must use the  special_mode dna     EXCL  t coffee sample dnaseql fasta  special mode dna    Q  I do not want to compute the alignment     A  use the  convert flag  EXCL  t coffee sample alnl aln  convert  output gcg    This command will read the  aln file and turn it into an  msf alignment     Q  I would like to force some residues to be aligned   If you want to brutally force some residues to be aligned  you may use as a post  processing  the force aln function of seq reformat     EXCL  t coffee  other pg seq reformat  in sample aln4 aln  action     force aln seql 10 seq2 15    EXCL  t coffee  other pg seq reformat  in sample aln4 aln  action   force aln sample lib4 tc 1ib02    sample lib4 tc
7.   profile_ mode  Usage   profile mode  lt cw_ profile profile  muscle profile profile     multi channel   Default   profile mode cw profile profile    When  profile comparison profile  this flag selects a profile scoring function     Alignment Post Processing     clean aln    Usage   clean aln  Default  clean aln    This flag causes T Coffee to post process the multiple alignment  Residues that have a  reliability score smaller or equal to  clean threshold  as given by an evaluation that uses    clean evaluate mode  are realigned to the rest of the alignment  Residues with a score  higher than the threshold constitute a rigid framework that cannot be altered     The cleaning algorithm is greedy  It starts from the top left segment of low constituency  residues and works its way left to right  top to bottom along the alignment  You can require  this operation to be carried out for several cycles using the  clean iterations flag     The rationale behind this operation is mostly cosmetic  In order to ensure a decent looking  alignment  the gop is set to  20 and the gep to  1  There is no penalty for terminal gaps  and  the matrix is blosum62mt     Note  Gaps are always considered to have a reliability score of 0     Note  The use of the cleaning option can result in memory overflow when aligning large  sequences     clean threshold  Usage   clean threshold  lt 0 9 gt     Default  clean aln 1  See  clean aln for details      clean iteration  Usage   clean_iteration  lt value betw
8.   whose order will be  used in the final alignment      seqnos    Usage   seqnos  lt on or off gt   Default  seqnos off    Causes the output alignment to contain residue numbers at the end of each line                             T COFFEE   seql aaa aaaa aa 9  seq2 a aa a4  segl a eu E    seq2 aaaaaaaaaaaaaaaaaaa 19       Libraries    Although  it does not necessarily do so explicitly  T Coffee always end up  combining libraries  Libraries are collections of pairs of residues  Given a set of  libraries  T Coffee makes an attempt to assemble the alignment with the highest  level of consistence  You can think of the alignment as a timetable  Each library pair  would be a request from students or teachers  and the job of T Coffee would be to  assemble the time table that makes as many people as possible happy        out_lib    Usage   out_lib  lt name of the library default no gt     Default  out_lib default    Sets the name of the library output  Default implies  lt run_name gt  tc_lib     lib_only    Usage   lib_only       63    Default  unset    Causes the program to stop once the library has been computed  Must be used in conjunction  with the flag    out_lib    Trees     newtree    Usage   newtree  lt tree file gt   Default  No file specified    Indicates the name of the file into which the guide tree will be written  The default will be   lt sequence_name gt  dnd  or  lt run_name dnd gt   The tree is written in the parenthesis format  known as newick or New Hampshire and u
9.   with  sim matrix         50     ndiag  Usage   ndiag  lt value gt   Default   ndiag 0    Indicates the number of diagonals used by the fasta pair wise algorithm  cf  dp mode    When  ndiag 0  n diag Log  length of the smallest sequence  1     When  ndiag and  diag threshold are set  diagonals are selected if and only if they  fulfill both conditions    diag mode    Usage   diag mode  lt value gt   Default   diag_mode 0    Indicates the manner in which diagonals are scored during the fasta hashing     0  indicates that the score of a diagonal is equal to the sum of the scores of the exact matches  it contains       indicates that this score is set equal to the score of the best uninterrupted segment  useful  when dealing with fragments of sequences       diag threshold    Usage   diag threshold  lt value gt   Default   diag threshold 0    Sets the value of the threshold when selecting diagonals     0  indicates that    ndiag should be used to select the diagonals  cf    ndiag section       sim matrix  Usage   sim_matrix  lt string gt     Default   sim matrix vasiliky    Indicates the manner in which the amino acid alphabet is degenerated when hashing in the  fasta pairwise dynamic programming  Standard ClustalW matrices are all valid  They are  used to define groups of amino acids having positive substitution values  In T Coffee  the  default is a 13 letter grouping named Vasiliky  with residues grouped as follows     rk  de  ah  vilm  fy  other residues kept alone      This
10.  0 2   S1 SEO1  OPTIONAL    S2 SEQ2  OPTIONAL               comment  OPTIONAL   SiL RL RL  2 Re REZ wil  12 WS           69          S1  S2  name of sequence 1 and 2   SEQI  sequence of S1   Ril  Ri2  index of the residues in their respective sequence  R1  R2  Residue type   V1  V2  V3  integer Values  V2 and V3 are optional     Valuel  Value 2 and Value3 are optional     Library List    These are lists of pairs of sequences that must be used to compute a library  The  format is           lt nseq gt    S1     8S2    E hamg2 globav  13 hamgw hemog singa       Substitution matrices     If the required substitution matrix is not available  write your own in a file using the  following format     ClustalW Style  Deprecated        VAVO  v4 v5 v6       v1  v2    are integers  possibly negatives     The order of the amino acids is  ABCDEFGHIKLMNQRSTVWXYZ  which means  that v1 is the substitution value for A vs A  v2 for A vs B  v3 for B vs B  v4 for A vs  C and so on     BLAST Format  Recommended           BLAST MATRIX FORMAT n  ALPHABET AGCT  A Gc      HE             anna    FO  o   NWN  w uS W             The alphabet can be freely defined    Sequences Weights    Create your own weight file  using the  seq_weight flag                     SINGLE SEQ WEIGHT FORMAT 01    seq_namel wal                70         seg_name2 v2       No duplicate allowed  Sequences not included in the set of sequences provided to  t coffee will be ignored  Order is free  V1 is a float  Un weighted sequenc
11.  XX o    es IN FLAG X  FRR OK IRR ok kk kk kk ok kk kk ko RRA RRA RAR RARA RRA RARA RARA AR  NAAA        flag indicating the name of the in coming sequences   IN FLAG S no name    no flag  ENE A SSA S Nos   gt     euam           IN FLAG  infile             CK Ck ck ck ck ck ck Ck Ck CK Ck Ck CC CC CK CC C C CK C CC ck Ck Sk Ck Sk CK Ck ck Ck Sk Ck Ck Kk Sk ko Sk ko Sk Sk XX o    a OUT_FLAG E  MAA AAA AAC AR ADAC MERA OCC e HERE REIR ede Me ME DU UE ete De Eee eode e ECCE ee   OUT_FLAG        flag indicating the name of the out coming data   same conventions as IN_FLAG   OUT FLAG S no name    no flag         OUT FLAG  outfile             CK ck Ck ck Ck ck CK Ck CK Ck CC Ck CK Ck CK Ck Ce Ck Sk CK Ck CC kk Sk Ck Sk CK Ck Cock kk Sk Ck RARA Sk ko Sk Sk XX A      SEQ TYPE      CK ck ck ck Ck Ck CK Ck CK Ck Ck Ck CK Ck CK Ck Ck Ck Ck CK Ck Ck ck Ck ck Ck Sk Cock Kock kk Sk Ck Ck Ck ck ko Sk ko Sk ko XX o                   Go Genome  Ss Sequences  Pa PID  Re Projallle   Examples                                      7 SHOLYPE S sequences against sequences  default   NS QINMBE S JP Sequence against structure   SOUP Ie EE structure against structure   TS ONAL IE ES mix of sequences and structure       SEQ TYPE S             KKEKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKK KKK    PARAM    kc ck ck ck kk kk Ck Ck Ck Ck Ck ck ck ck kk kk Ck Ck Ck Ck Ck ck Ck ck ck ck kk kk kk Ck Ck Ck ck ck ck kv Sk kx kx XX EX KKK  pipamametenca eni TO dns acu Asia   e there ls more en 1 PAVAN lim
12.  a list of pairs of residues that could be aligned  It is like a Xmas list  you can ask  anything you fancy  but it is down to Santa to assemble a collection of Toys that  won t get him stuck at the airport  while going through the metal detector     Given a standard library  it is not possible to have all the residues aligned at the  same time because all the lines of the library may not agree  For instance  line 1 may  say    Residue 1 of seq A with Residue 5 of seq B     and line 100 may say    Residue 1 of seq A with Residue 29 of seq B     Each of these constraints comes with a weight and in the end  the T Coffee  algorithm tries to generate the multiple alignment that contains constraints whose       sum of weights yields the highest score  In other words  it tries to make happy as  many constraints as possible  replace the word constraint with  friends  family  members  collaborators    and you will know exactly what we mean      You can generate this list of constraints however you like  You may even provide it  yourself  forcing important residues to be aligned by giving them high weights  see  the FAQ   For your convenience  T Coffee can generate  this is the default  its own  list by making all the possible global pairwise alignments  and the 10 best local  alignments associated with each pair of sequences  Each pair of residues observed  aligned in these pairwise alignments becomes a line in the library     Yet be aware that nothing forces you to use this library 
13.  alphabet is set with the flag  sim matrix vasiliky  In order to keep the alphabet non  degenerated   sim matrix idmat can be used to retain the standard alphabet      matrix  CW     Usage   matrix  lt blosum62mt gt   Default   matrix blosum62mt    The usage of this flag has been modified from previous versions  due to frequent mistakes in  its usage  This flag sets the matrix that will be used by alignment methods within t_coffee   slow pair  lalign id pair   It does not affect external methods  like clustal pair   clustal aln            51     nomatch    Usage   nomatch  lt positive value gt   Default   nomatch 0    Indicates the penalty to associate with a match  When using a library  all matches are  positive or equal to 0  Matches equal to O are unsupported by the library but non penalized   Setting nomatch to a non negative value makes it possible to penalize these null matches and  prevent unrelated sequences from being aligned  this can be useful when the alignments are  meant to be used for structural modeling       gapopen  Usage   gapopen  lt negative value gt   Default   gapopen 0    Indicates the penalty applied for opening a gap  The penalty must be negative  If no value is  provided when using a substitution matrix  a value will be automatically computed      gapext  Usage   gapext  lt negative value gt   Default   gapext 0    Indicates the penalty applied for extending a gap  cf  gapopen      fgapopen  Unsupported     fgapext  Unsupported     cosmetic penalty 
14.  and causes t  coffee to evaluates a  pre computed alignment provided via  infile  lt alignment gt   The flag  output must be set to  an appropriate format  i e   output score_ascii  score html or score pdf      The main purpose of    evaluate is to let you control every aspect of the evaluation  Yet it is  advisable to use pre defined parameterization  special mode evaluate     EXCL  t coffee  infile sample alnl aln  special mode evaluate    EXCL  t coffee  infile sample seql aln  in Lsample libl tc lib    special mode evaluate     convert  cw     Usage   convert  Default  turned off    Toggles on the conversion mode and causes T Coffee to convert the sequences  alignments   libraries or structures provided via the  infile and  in flags  The output format must be set  via the  output flag  This flag can also be used if you simply want to compute a library  i e   you have an alignment and you want to turn it into a library      This flag is ClustalW compliant      do align  cw   Usage   do align    Default  turned on    Special Parameters     version    Usage   version  Default  not used    Returns the current version number     check configuration    Usage   check configuration  Default  not used    Checks your system to determine whether all the programs T Coffee can interact with are  installed      cache    Usage   cache  lt use  update  ignore   lt filename gt  gt   Default   cache use    By default  t_coffee stores in a cache directory  the results of computationally ex
15.  denne fert e ee etes  multi  thread  NOE Supported     e RE RU ist er RARI BOR OST ne  Limits     mem  Mode eiir E                                        de  UN a ie       A EE    dpamasterzaln  te ir ede rne ep etre ee   cdpa Haxnsequ  cae DU EU UE e ie MS    dpa O isis aa nete e d T get e Pee   ADAMS COMODO lcd  ne    dap tree   NOT IMPLEMENTED     ier titt e el e teet edens be daa sente ent  Using Structures   Generic                                                           M                     specidl mode adea e NUR lent LA PE E DOG U NOE ANO CO UE V n rt atis      CHECK PAD status  suu mss beate eta EN nt   3D Coffee  Using SAP    i sss s eie cde dete i redde   Using finding PDB templates for the Sequences   AS A t n M DRE e A etur   RIORUM  CR  Multiple Local Alignments   MOM GUM OCCA s  os ida     domain interactive  Examples f ren me ie ee Ea ss  Uunn                  M     Generic    S  Conventions Regarding Filenames ooo  Identifying the Output files automatically     JAllgnmenis    dm soe e SES NER NUN EE HD UU B OUR Gp Tete RR ONU ee eis          T COFFEE REFERENCE MANUAL    OMC ati a ia D MY 61  UU sons unas de ica te a rade nl te taa 61  LUS A LA RETE een E 62  A en Neate tete tacere d Ta etre Rule vam ah La Est es msn die diets 62  Eg m 63    QuiseqWelghl  Li A e e HP RP E MPO e eene ee b En e ER EU ern 63  QUIE E                         a 63  EGNOS DE 63  DEI HERE 63  UL A                                                               e 63  dim             
16.  either modifying the METHODS 4 TCOFFEE in  define headers h  and recompile  or by modifying the envoronement variable  METHODS 4 TCOFFEE     Advanced Method Integration    It may sometimes be difficult to customize the program you want to use through a  tc method file  In that case  you may rather use an external perl script to run your  external application  This can easily be achieved using the  generic method tc method file                                       TC METHOD FORMAT 01   GET S Se DS uos ms Os Raw T ge neri e method Aci eo AS  EXECUTABLE tc generic method pl   ALN MODE pairwise   IN FLAG  infile    OUT FLAG  outfile    OUT MODE aln   PARAM  method clustalw   PARAM  gapopen  10   SEROR EE S    CK ck ck ck ck Ck Ck CK C CK Ck Ck Ck Ck CK Ck Ck Ck Sk Ck Ck CK RARA ck Ck Sk Ck Sk ko Sk ko Sk RA KA AAA         Note   amp bsnp can be used to for white spaces       When you run this method   EXCL  t coffee sample seql fasta  in Mgeneric method tc method    T Coffee runs the script tc generic method pl on your data  It also provides the  script with parameters  In this case    method clustalw indicates that the script should  run clustalw on your data  The script tc generic method pl is incorporated in  t coffee  Over the time  this script will be the place where novel methods will be       24          integrated    will be used to run the script tc_generic_method pl  The file tc generic method pl  is a perl file  automatically generated by t coffee  Over the time this
17.  file will make  it possible to run all available methods  You can dump the script using the following  command     EXCL  t coffee  other pg unpack tc generic method pl    Note  If there is a copy of that script in your local directory  that copy will be used in  place of the internal copy of T Coffee     The Mother of All method files                 TC METHOD FORMAT 01   ge me ee IIIS  Ce Tile ANO AAA        Incorporating new methods in T Coffee   i Cedric Notredame 17 04 05         CK ck ck ck Ck CK Ck CK Ck KC C 0k Ck CK C CK Ck Ck Ck Ck CK Ck CK Ck ck Sk Ck Sk C ck C ck kk ck Ck RARA Sk ko Mk AAA o     This file is a method file    Copy it and adapt it to your need so that the method   you want to use can be incorporated within T Coffee  cock ck ck ck ck ck ck k ck XX XX ck k ck XX XX k ck ck XX k k k ck XX k k ck ko ck ck k ck ck ko ko Mk AAA A xo    x USAGE      kk Ck ck ck ck ck ckckckckck ck ck ck Ck ck Ck kk kk kk Ck Ck Ck Ck Ck Ck kk kk kk kk kk Ck Ck Ck kk kk kv EX XX kc ko ko     This file is passed to t coffee via  in           d ib CO i in Mgeneric method method       5 The method is passed to the shell using the following   eue e          lt EXECUTABLE gt  lt IN_FLAG gt  lt seq_file gt  lt OUT_FLAG gt  lt outname gt  lt PARAM gt                                   Conventions       FLAG NAME gt  STIYPES   VALUE      lt VALUE gt   no_nam  lt   gt  Replaced with a space    lt VALUE gt    amp nbsp  lt   gt  Replaced with a space                CK ck cock ck Ck CC C
18.  lib02 is a T Coffee library using the tc_lib02 format    TC LIB FORMAT 02    SeqX resY ResY index SeqZ ResZ ResZ index     The TC LIB FORMAT 02 is still experimental and unsupported  It can only be used in the       context of the force aln function described here       Given more than one constraint  these will be applied one after the other  in the  order they are provided  This greedy procedure means that the Nth constraint may  disrupt the  N 1 th previously imposed constraint  hence the importance of forcing  the constraints in the right order  with the most important coming last     We do not recommend imposing hard constraints on an alignment  and it is much  more advisable to use the soft constraints provided by standard t  coffee libraries  cf   building your own libraries section        33    Q  I would like to use structural alignments     See the section Using structures in Multiple Sequence Alignments  or see the  question   want to build my own libraries     Q  I want to build my own libraries     A  Turn your alignment into a library  forcing the residues to have a very good  weight  using structure     EXCL  t coffee  in Asample seql aln  weight 1000    out lib sample seql tc lib  lib only    The value 1000 is simply a high value that should make it more likely for the  substitution found in your alignment to reoccur in the final alignment  This will  produce the library sample alnl tc lib that you can later use when aligning all the  sequences     EXCL  t co
19.  matrix file gt  or   integer value gt   Default   weight sim    Weight defines the way alignments are weighted when turned into a library     winsimN indicates that the weight assigned to a given pair will be equal to the percent  identity within a window of 2N 1 length centered on that pair  For instance winsim10  defines a window of 10 residues around the pair being considered  This gives its own weight  to each residue in the output library  In our hands  this type of weighting scheme has not  provided any significant improvement over the standard sim value     EXCL  t coffee sample seql fasta  weight winsim10    out lib test tc lib    sim indicates that the weight equals the average identity within the sequences containing the  matched residues        48    sim_matrix_name indicates the average identity with two residues regarded as identical  when their substitution value is positive  The valid matrices names are in matrices h   pam250mt   Matrices not found in this header are considered to be filenames  See the  format section for matrices  For instance   weight sim pam250mt indicates that the  grouping used for similarity will be the set of classes with positive substitutions     EXCL  t coffee sample seql fasta  weight winsiml0    out lib test tc lib   Other groups include   sim clustalw col   categories of clustalw marked with      sim clustalw dot   categories of clustalw marked with       Value indicates that all the pairs found in the alignments must be given the
20.  other coordinates for the repeat  such as    EXCL  t coffee  in sample libl mocca lib  domain  start 10  len 60       59    This run will use the fragment 100 160  and will be much faster because it does not need to  re compute the lalign library      Start    Usage   start  lt int value gt   Default  not set    This flag indicates the starting position of the portion of sequence that will be used as a  template for the repeat extraction  The value assumes that all the sequences have been  concatenated  and is given on the resulting sequence      len    Usage   len  lt int value gt   Default  not set    This flag indicates the length of the portion of sequence that will be used as a template      scale    Usage   scale  lt int value gt   Default   scale  100    This flag indicates the value of the threshold for extracting the repeats  The actual threshold  is equal to     motif len scale    Increase the scale  lt  gt  Increase sensitivity       More alignments  i e   50       domain interactive  Examples     Usage   domain interactive  Default  unset    Launches an interactive mocca session     EXCL  t coffee  in Lsample lib3 tc lib Mlalign rs s pair  domain    start 100  len 60          MOTERA COI atl SKLAYVTFESGR  SALVIQTLANGAVRQV    ASFPRHNGAPAFSPDGSKLAFA   TOLB ECOLI 165 218 164 TRIAYVVOTNGGOFPYELRVSDYDGYNOFVVHRSPOPLMSPAWSPDGSKLAYV   TOLB ECOLI 256 306 255 SKLAFALSKTGS  LNLYVMDLASGOIRQV TDGRSNNTEPTWFPDSOQNLAFT   TOLB ECOLI  307 350 S06      5  DQAGR  POVYKVNINGGAPORI TWE
21.  same weight  equal to value  This is useful when the alignment one wishes to turn into a library must be  given a pre specified score  for instance if they come from a structure super imposition  program   Value is an integer     EXCL  t coffee sample seql fasta  weight 1000  out lib test tc lib    Tree Computation     distance matrix mode    Usage   distance matrix mode  slow  fast  very fast   Default  very fast    This flag indicates the method used for computing the distance matrix  distance between  every pair of sequences  required for the computation of the dendrogram     Slow The chosen dp  mode using the extended library   fast  The fasta dp mode using the extended library   very fast The fasta dp mode using blosum62mt   ktup Ktup matching  Muscle kind     aln Read the distances on a precomputed MSA     quicktree  CW   Usage   quicktree    Description  Causes T Coffee to compute a fast approximate  guide tree  This flag is kept for compatibility with ClustalW  It indicates that    EXCL  t coffee sample seql fasta  distance matrix mode very fast    EXCL  t coffee sample seql fasta  quicktree    Pair wise Alignment Computation          49        dp_mode    Usage   dp_mode  lt string gt   Default   dp mode cfasta fair wise    This flag indicates the type of dynamic programming used by the program   EXCL  t coffee sample seql fasta  dp mode myers miller pair wise    gotoh pair wise  implementation of the gotoh algorithm  quadratic in memory and time     myers miller pai
22.  will output the tree  in new hampshire format  and the alignment to  stdout        30    Q  Is it possible to pipe stuff INTO t_coffee     A  If as a file name  you specify stdin  the content of this file will be expected  throught pipe     EXCL  cat sample seql fasta   t coffee  infile stdin  will be equivalent to    EXCL  t coffee sample seql fasta    If you do not give any argument to t coffee  they will be expected to come from  pipe     EXCL  cat sample param file param   t coffee  parameters stdin  For instance     EXCL  echo  in Ssample seql fasta Mclustalw pair   t coffee    parameters stdin    Q  Can I read my parameters from a file     A  See the well behaved parameters section     Q  I want to decide myself on the name of the output  files       A  Use the  run_name flag     EXCL  t coffee sample seql fasta  run name guacamole    Q  I want to use the sequences in an alignment file    A  Simply fed your alignment  any way you like  but do not forget to append the  prefix S for sequence     EXCL  t coffee Ssample alnl aln  EXCL  t coffee  infile Ssample alnl aln    EXCL  t coffee  in Ssample alnl aln Mslow pair Mlalign id pair    outfile outaln    This means that the gaps will be reset and that the alignment you provide will not be  considered as an alignment  but as a set of sequences     Q  I only want to produce a library  A  use the  lib only flag    EXCL  t coffee sample seql fasta  out lib sample libl tc lib    lib only    Please  note that the previous usage 
23. ANT  All the files mentioned here  sample seq     can be found in the example directory  of the distribution         NOT  Fetching Sequences    T Coffee will NOT fetch sequences for you  you must select the sequences you  want to align before hand  We suggest you use any BLAST server and format your  sequences in FASTA so that T COFFEE can use them easily     Aligning Sequences    Making accurate multiple alignments of DNA  RNA or Protein sequences     Combining Alignments    T Coffee allows you to combine results obtained with several alignment methods   For instance if you have an alignment coming from ClustalW  an other alignment  coming from Dialign  and a structural alignment of some of your sequences  T   Coffee will combine all that information and produce a new multiple sequence  alignment having the best agreement with all these methods  see the FAQ for more  details     EXCL  t coffee    in Asample alnl aln Asample aln2 aln Asample aln3 aln    outfile combined aln aln    Evaluating Alignments    You can use T Coffee to measure the reliability of your Multiple Sequence  alignment  If you want to find out about that  read the FAQ or the documentation for  the  output flag     EXCL  t coffee  infile sample alnl aln  special mode evaluate       Combining Sequences and Structures    One of the latest improvements of T Coffee is to let you combine sequences and  structures  so that your alignments are of higher quality  You need to have sap  package installed to fully ben
24. Format  newick tree format  ClustalW Style     This flag indicates that rather than computing a new dendrogram  t_coffee must use a pre   computed one  The tree files are in phylips format and compatible with ClustalW  In most  cases  using a pre computed tree will halve the computation time required by t_coffee  It is  also possible to use trees output by ClustalW  Phylips and any other program     Methods and Library Input        in  Usage   in   lt P S A L M X gt  lt name gt     Default   in Mlalign_id pair Mclustalw pair    See the box for an explanation of the  in flag  The following argument passed via  in    EXCL  t coffee    in Ssample seql fasta Asample alnl aln Asample aln2 msf Mlalign id  _pair Lsample libl tc lib  outfile outaln    This command will trigger the following chain of events     1 Gather all the sequences    Sequences within all the provided files are pooled together  Format recognition is automatic   Duplicates are removed  if they have the same name   Duplicates in a single file are only  tolerated in FASTA format file  although they will cause sequences to be renamed        44    In the above case  the total set of sequences will be made of sequences contained in  sequences1 seq  alignment1 aln  alignment2 msf and library lib  plus the sequences initially  gathered by  infile     2 Turn alignments into libraries    alignmentl aln and alignment2 msf will be read and turned into libraries  Another library  will be produced by applying the method lalig
25. GSONODADVSSDGKFMVMV   MORE SOS ONT CDS 350          SNGGQ  OHIAKOQDLATGGV QV LSSTFLDETPSLAPNGTMVIYS               MENU  Type Letter Flag number  and Return  ex  10     x    gt Set the START to x   gt X SSS Sele the LEN   OMS  Cx    gt  Sac the scale to 5  Sname    gt  Saye the Alignment  Bx     8ave Goes back x it  return      Compute the Alignment  X     eXit   ITERATION 1   START 211   LEN  50   SCALE  100  YOUR CHOICE     For instance  to set the length of the domain to 40  type      ITERATION 1   START 211   LEN  50   SCALE  100  YOUR CHOICE  gt 40 return    return              60       Which will generate     TOLBRES OTE Za 211 SKLAYVTFESGRSALVIOTLANGAVROVASFPRHNGAPAF 251  TOLB_ECOLI_256_296 255 SKLAFALSKTGSLNLYVMDLASGQIROVIDGRSNNTEPTW 295  TOLB_ECOLI_300_340 299 ONLAFTSDOAGRPOVYKVNINGGAPORITWEGSQNODADV 339  TOLB ECOLI 344 383 343 KFMVMVSSNGGOOHIAKODLATGGV OVLSSTFLDETPSL 382  TOLB ECOLI 387 427 386 TMVIYSSSOGMGSVLNLVSTDGRFKARLPATDGOVKFPAW 426   al       ee 40    MENU  Type Letter Flag number  and Return  ex  10           x    gt Set the START to x   gt x et the LEN COR  Cx    gt  Sale the scale to x  Sname AS the Alignment  Bx      gt Save Goes back x it  return    gt Compute the Alignment  X     eXit   ITERATION 3   START 211   LEN  40   SCALE  100  YOUR CHOICE        If you want to indicate the coordinates  relative to a specific sequence  type       lt seq_name gt  start    Type S lt your name gt  to save the current alignment  and extract a new motif     Type X when 
26. K KK KK KA RARA KARA Ck ck ck RA RARA RA RARA Sk ko Mk AAA AR    vs EXECUTABLE 2s    CK ck ck ck ck Ck ck C Ck CK Ck C Ck Ck CK Ck CK Ck Ck Sk CK Ck Ck ck kk Sk Cock c ck ck Ck Sk Ck Sk ko Sk kk ko Mk AAA XX        name of the executable   passed to the shell  executabl         EXECUTABLE tc generic method pl                        Ck Ckckck KKK KKK KKK KKK ck ck ck ck kk Ck Ck Ck Ck Ck Ck Ck Ck kk kk kk kk kk Ck kk ck ko ko kv ko k KE kc    i ALN MODE 5s    kk Ck ck ck ck ck ckckckckck ck ck ck ck ck Ck kk ck kk Ck Ck Ck Ck ck ck Ck Ck kk kk Sk kk kk Ck Ck Ck kk ko kv kv XXE KE KE        pairwise  Seuli Ye all  ime slri     D  92 m   Zedbal     Hl PAuliiwilse   gt  we all  mo sels ac 2 a  2   SM as ell vs all ES ES AL A   multiple    All the sequences in one go       ALN MODE pairwise         Ck ck ck ck ck ckckckckckckck ck ck ck ck kk ck kk kk Ck Ck Ck ck ck Ck kk ck kk kk kk kk Ck kk ck kk kv kv kx KE KE    i OUT MODE  amp              25             CK ck ck ck ck Ck ck Ck Ck CK Ck Ck CC CK RARA KKK KKK KKK RARA Ck CK ck Sk Ck RARA Sk ko Sk ko Sk AAA XX      mode for the output     External methods      aln   gt  alignmnent File  Fasta or ClustalW Format   O  a ies  QUO Jeans  Je O MA O       Internal Methods      fL   gt  Internal Function returning a Lib  Librairie     fA   gt  Internal Function returning an Alignmnent         OUT MODE aln            CK ck cock ck Ck ck Ck Ck CK Ck Ck Ck Ck CC CK CC Ck C CK Ck CC ck Ck ck Ck CK Ck CK ck kk Sk Ck Sk Kk Ck ko Sk ko Sk ko
27. MPLEMENTED     Usage   dpa tree  lt filename gt   Default   unset    Guide tree used in DPA  This is a newick tree where the distance associated with each node  is set to the minimum pairwise distance among all considered sequences        56    Using Structures    Generic     special mode    Usage   special_mode 3dcoffee  Default  turned off    Runs t_coffee with the 3dcoffee mode  cf next section       check pdb status    3D    Usage   check pdb status  Default  turned off    Forces t  coffee to run extract from pdb to check the pdb status of each sequence  This can  considerably slow down the program     Coffee  Using SAP    It is possible to use t  coffee to compute multiple structural alignments  To do so  ensure that  you have the sap program installed     EXCL  t coffee  in strucl pdb struc2 pdb struc3 pdb Msap pair  Will combine the pairwise alignments produced by SAP  There are currently two methods  that can be interfaced with t  coffee   sap pair  that uses the sap algorithm  align pdb  usesat coffee implementation of sap  not as accurate     By default  the computation will be made only on the first chain contained in the pdb file  If  your structure is an NMR structure  you are advised to provide the program with one  structure only     If you wish to align only a portion of the structure  you should extract it yourself from the  pdb file  using t coffee    other pg extract from pdb or any pdb handling program     You can provide t  coffee with a mixture of sequen
28. Manual    CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE  C  dric Notredame    T Coffee   User Guide   and   Reference Manual    T Coffee User Guide   Version 3 18  September 2005     O C  dric Notredame and Centre National de la Recherche Scientifique  France       T COFFEE REFERENCE MANUAL       License and Terms ON US ltda   n 6  T Coffee is distributed under the Gnu Public License                   s sscccccscccccsssssssssssssssccccccccccsssesssssssssceccceseesescesssssesees 6  T Coffee code can be re used freely   Addresses and COnta6ts ab a aa e ea arco et iis 7  AAA                                                      7  Addresses  iie O RON 7          Whirabls  T COEP BE e Qoa 11  Whatis TaCofle vd                         M    Non 11  MHIL 11  What can it align  dote e pte RD der  HOW CAN D USCULG   sees eun ol Notae teres Il  Is T Coffee different from ClustalW  ooo 12    What T Coffee Can and Cannot do for you   NOT  Fetching Sequences  Aligning Seguences este til ee E ERR p Rte  Combining Alignments  Evaluating Aligumenls     il NN  Combining Sequences and Structures       Identifying Occurrences of a Motif  Mocca           sse eene tree13  How Does Un                              13   AREA SE ud fed  S RES    Standard Installati  n  sissano aaeei asosa            Sanear ases  Extended Installation and other Packages       CITE SHARE E 17  A NT 17  IM OG GA cuida ERREUR ITE I E T EREE FES 17    Manada ns Et nU   ice veau ded e Aaa AAN    ITA                   M    St
29. ameterization associate with  special mode turns off every memory expensive heuristic within T Coffee  For  version 2 11 this amounts to    EXCL  t coffee sample seql fasta  in Mslow pair Mlalign id pair    distance matrix mode slow  dp mode myers miller pair wise    If you keep running out of memory  you may also want to lower    maxnseq  to  ensure that t coffee dpa will be used     Input Output Control    Q  How many Sequences can t  coffee handle    A  T Coffee is limited to a maximum of 50 sequences  Above this number  the  program automatically switches to a heuristic mode  named DPA  where DPA  stands for Double Progressive Alignment     DPA is still in development and the version currently shipped with T Coffee is only  a beta version     Q  How many ways to pass parameters to t coffee     A  See the section well behaved parameters  Q  How can I change the default output format   A  See the  output option  common output formats are     EXCL  t coffee sample seql fasta  output msf fasta aln    Q  My sequences are slightly different between all the  alignments     A  It does not matter  T Coffee will reconstruct a set of sequences that incorporates  all the residues potentially missing in some of the sequences   see flag  in      Q  Is it possible to pipe stuff OUT of t coffee     A  Specify stderr or stdout as output filename  the output will be redirected  accordingly  For instance    EXCL  t coffee sample seql fasta  outfile stdout  out lib stdout    This instruction
30. and that you could build it  using other methods  see the FAQ   In protein language  T COFEE is synonymous  for freedom  the freedom of being aligned however you fancy  It is with that sort of  statements that I got elected Chief Tryptophan Officer in some previous life         Standard Installation    1 decompress distribution tar gz    gunzip distribution tar gz    2 untar distribution tar    tar  xvf distribution tar    3 This will create the distribution directory with the following structure   distribution bin  distribution doc t_coffee doc pdf t_coffee doc html  distribution  t coffee source  distribution example    distribution html    4 go into the main directory and type       install    You will know the installation proceeded completely with the mention     Installation of t coffee Successful    5 add the bin folder to your path   set path     path     address of the t coffee bin folder    Note  The latest t coffee distribution  2 15 and higher  is self contained and only    requires one executable  You may still require external modules  sap  blast  ClustalW   if you wish to use another mode than the default        15    Note  When updating  make sure to remove the old distribution and any associated  program from your path     6 If you have PDB installed     Assuming you have a standard PDB installation in your file system    setenv PDB DIR pdb dir structures all     Note  This must be added to your login file     Extended Installation and other Packages    By de
31. andard Alignments  Alignment Combination suisses   ENA                                                                                  Aligning Sequences and Structures      Aligning Sequences and Profiles                                     Using Structures  Or templates  Within Profiles        ue   Using New and Existing Methods                    eere eese eee eee eene tnt tn nnns stata sonans snas ta senses stains tasas sens tn seasons tuas tnn  Using Methods Integrated in T Coffee  Integrating External Methods  Managing a collection of method files  Advanced Method Integration  The Mother of All method files                  sss   Creating Your Own T Coffee Libraries               T COFFEE REFERENCE MANUAL    Using Pre Computed Alignments                                   ss  Customizing the Weighting Scheme  Generating Your QWn LIbFArtes   s  aiit if We I E EUR AA RE A dict         Frequently Asked Questions                   sessessee 29    Abnormal Terminations and Wrong Results                  eerie ee eee eee eee eee eese en tetas tta setas etna etas etus essen sens senate ss ens sense 29  O  The program keeps crashing when I give my sequences    29  Q  The default alignment is not good enough         O  The alignment contains obvious mistakes                                      ss  O  The  program is crashing  is d Rees dl  O  I am running out of memory    eene tenentes  Input Output Control      O  How many Sequences can t coffee handle  O  How many ways to pas
32. ans 72    Technical Notes neral litis 73    YSL OA 73  Command Line Ls edet eesete o etate eL e eee eie e at ede dede eta eee debe eed 73       T Coffee is distributed under the Gnu Public  License    Please make sure you have agreed with the terms of the license attached to the  package before using the T Coffee package or its documentation  T Coffee is a  freeware open source distributed under a GPL license  This means that there is no  restriction to its use  either in an academic or a non academic environment     T Coffee code can be re used freely    Our philosophy is that code is meant to be re used  including ours  No permission is  needed  although we are always happy to receive pieces of improved code        Contributors  T coffee is developed by a dedicated team that includes    C  dric Notredame  Olivier Poirot  Fabrice Armougom    Sebastien Moretti    Addresses    We are always very eager to get some user feedback  Please do not hesitate to drop  us a line at  cedric notredame Oeurope com The latest updates of T Coffee are  always available on  http   igs server cnrs mrs fr  cnotred  On this address you will  also find a link to some of the online T Coffee servers  including Tcoffee igs   http   igs server cnrs mrs fr Tcoffee              T Coffee can be used to automatically check if an updated version is available   however the program will not update automatically  as this can cause endless  reproducibility problems     EXCL  t coffee  update       It is import
33. ant that you cite T Coffee when you use it  Citing us is  almost  like  giving us money  it helps us convincing our institutions that what we do is useful  and that they should keep paying our salaries and delivering Donuts to our offices  from time to time  Not that they ever did it  but it would be nice anyway      Cite the server if you used it  otherwise  cite the original paper from 2000  No  it  was never named  T Coffee 2000       Notredame C  Higgins Related Articles Links  DG  Heringa J           T Coffee  A novel method for fast and accurate multiple sequence  alignment    J Mol Biol  2000 Sep 8 302 1  205 17    PMID  10964570  PubMed   indexed for MEDLINE     Other useful publications include     T Coffee    Claude JB  Suhre K  Related Articles Links  Notredame C  Claverie    JM  Abergel C              CaspR  a web server for automated molecular replacement using  homology modelling    Nucleic Acids Res  2004 Jul 1 32 Web Server issue   W606 9   PMID  15215460  PubMed   indexed for MEDLINE     Poirot O  Suhre K  Abergel C  Related Articles Links  O Toole E  Notredame C              3DCoffee  igs  a web server for combining sequences and  structures into a multiple sequence alignment    Nucleic Acids Res  2004 Jul 1 32 Web Server issue   W37 40   PMID  15215345  PubMed   indexed for MEDLINE        O Sullivan O  Suhre K  Related Articles Links  Abergel C  Higgins DG   Notredame C              3DCoffee  combining protein sequences and structures within  multiple seque
34. be ee des SE SR x 42  type  E                                                          dns 43  Rl RE 43  SOG S OUNCE Em 43  Structure IDUL   m 43  A S Pete ct dt eM OS 43  Tree INP A e lo ato alt des bie nds a per 43  USC A A ii ie 43  Methods and Library Input    seen 44  V ERE 44  REO SIL T SHEET 45  SDP OPE C                            M                                     45   profilel  cw     essc un iei dee e eei dee de sede deco nnd 46  sprofile2 POW  A et tao trt m rer par e e AE ie e HT A 46   Exidu21250 0 rn                                                       46  Library Computation  Methods    e e dest epe e Ee th vg PR eee e repo PPAR 46  Jlalign  n topi id A ede dtt b te A c ig IRR RR RETE tud 46  ATEN PAD Parral e tee PU Ned d ead dee berg 46   align  pdb hasch mode    aep IER EE E e gere EH enis va esos REED HR ad E autre Dowd 46  Library Computation  Extension    sise 46  Jib list  Unsupported        4  ess eee i e eddie eie dede de beside cornes 46  HAO NOMAS E eshs tien ne in M een Lee RER nek Nes bee E ete ne tnt UN     ied ts eh ook elated tae 47  sextend  SRE SAR ER aei tasto quas endete d a 47  Bana EE                  47  MAX Dali ie de eet eee UA e le toutes es PLE P le Ales cute medal ee veut 48   seq name for quadruplet ss 48  O se                                                      MM 48  QU D                                                                                        48  mU                                      B   48  El e            
35. ces and structure  In this case  you should    use the special mode     EXCL  t coffee  special mode 3dcoffee  seq 3d sample3 fasta    template file template file template    Using finding PDB templates for the Sequences     template file    Usage   template file        filename   SCRIPT scriptame   SELF TAG       57    SEQFILE_TAG filename     no gt   Default  no    This flag instructs t_coffee on the templates that will be used when combining several types  of information  For instance  when using structural information  this file will indicate the  structural template that corresponds to your sequences  The identifier T indicates that the file  should be a FASTA like file  formatted as follows  There are several ways to pass the  templates     1 File name    This file contains the sequence template association it uses a FASTA like format  as follows         gt  lt sequence name   P    pdb template     gt  lt sequence name gt   OG   gene template     gt  lt sequence name gt  R  lt MSA template         Each template will be used in place of the sequence with the appropriate method  For  instance  structural templates will be aligned with sap pair and the information thus  generated will be transferred onto the alignment     Note the following rule    Each sequence can have one template of each type  structural  genomics       Each sequence can only have one template of a given type   Several sequences can share the same template   All the sequences do not need to have a temp
36. e  tas lines a    iconem       PARAM  method clustalw   PARAM  OUTORDER INPUT  NEWTR                                    T       E core  align  gapopen  15       CK ck Ck ck Ck ck Ck Ck CK Ck C CK Ck CC CK Ck Ce Ck Ck CK Ck Ck ck Ck ck Ck Sk C Ck ck Ck Sk Ck Ck Pk Sk ko Sk ko Sk Sk kx kv A    E END E    CK ck ck ck Ck Ck CK C CK Ck Ck C KA RARA KARA Ck CK Ck Ck Ck RARA RARA Sk Ck Ck ko Sk ko Sk ko Sk AAA ER          Creating Your Own T Coffee Libraries    If the method you want to use is not integrated  or impossible to integrate  you can       26       generate your own libraries  either directly or by turning existing alignments into  libraries  You may also want to precompute your libraries  in order to combine them  at your convenience     Using Pre Computed Alignments    If the method you wish to use is not supported  or if you simply have the  alignments  the simplest thing to do is to generate yourself the pairwise multiple  alignments  in FASTA  ClustalW  msf or Pir format and feed them into t_coffee  using the  in flag     EXCL  t coffee  in Asample aln1 1 aln Asample alnl 2 aln    outfile combined aln aln    Customizing the Weighting Scheme    The previous integration method forces you to use the same weighting scheme for  each alignment and the rest of the libraries generated on the fly  This weighting  scheme is based on global pairwise sequence identity  If you want to use a more  specific weighting scheme with a given method  you should either     generate your o
37. e Profiles   ie 35  O  Can I align two profiles according to the structures they contain                   ss 35  Exi 21g r                                                                                                          36  O  How  good is my alignment  aci ode A dada 36  O  What  18 that color index Pis poete aaa 36  O  Can I evaluate alignments NOT produced with T Coffee  sun 36  O  Can I Compare Two Alignments  iii enana 36  Q  Lam aligning sequences with long regions of very good overlap        Entering the right parameters     Parameters ZU Lr  GRE                    T COFFEE REFERENCE MANUAL       NO PTGS iii A A Ai 39  IS E NN 39  ACCES US A oda 40  sspecial  mode    e a Pre dre de RAM AN ete qc dae es ates ad deh RM ite eR 40   score  Deprecated             e di eicere e ede dee dada dei ia S aane 40  SQUID  SCORVEF  IGCW    ct ttt cet em MU ee ne Ne ay cates winks ee ev d e e lend   do align  cw    Special Param eters  est 41  VOU SION O 41   Check  Configurdli  ns a ie se A ad 41  O A RE sete ue ele eee ne AS LU ee een A NT LIAE ELI Nad 2  Beet er oot a ent 41  update oo O ee rene end de en nee ne eee anna 42  PUNTO ras e AA te ee E us ae tale ne Ann    42  other  Dg  eoe Re as he ete utet e a re 42   lim                                                                    42  DEGUENCEINPUE ERE 42  INE  CW  itt t tete ern ee t e ru edu dece doses OR ads 42   in  Cf    in from the Method and Library Input section  ss 42   get  I I RDS CES ENTER dea eed sott be
38. e named according to the sequences   For instance  if your protein sequences have been recoded with Exon Intron information  you  should have the recoded sequences names according to the original     SEQFILE G recodedprotein fasta     struc to use    Usage   struc to use  lt strucl  struc2    gt   Default   struc to use NULL    Restricts the 3Dcoffee to a set of pre defined structures     Multiple Local Alignments    It is possible to compute multiple local alignments  using the moca routine  MOCA  is a routine that allows extracting all the local alignments that show some similarity  with another predefined fragment      mocca  is a perl script that calls t coffee and provides it with the appropriate  parameters      domain  mocca    Usage   domain  Default  not set    This flag indicates that t_coffee will run using the domain mode  All the sequences will be  concatenated  and the resulting sequence will be compared to itself using lalign_rs_s_pair  mode  lalign of the sequence against itself using keeping the lalign raw score   This step is  the most computer intensive  and it is advisable to save the resulting file     EXCL  t coffee  in Ssample seql fasta Mlalign rs s pair    out lib sample libl mocca lib  domain  start 10  len 50    This instruction will use the fragment 100 150 on the concatenated sequences  as a template  for the extracted repeats  The extraction will only be made once  The library will be placed  in the file   lib name       If you want  you can test
39. een 1 and  gt     Default  clean iteration 1  See  clean aln for details      clean evaluation mode  Usage   clean_iteration  lt evaluation mode  gt     Default  clean iteration t coffee non extended    Indicates the mode used for the evaluation that will indicate the segments that should be  realigned  See  evaluation mode for the list of accepted modes        54     iterate    Usage   iterate  lt integer gt   Default   iterate 0    Sequences are extracted in turn and realigned to the MSA  If iterate is set to  1  each  sequence is realigned  otherwise the number of iterations is set by    iterate     CPU Control  Multithreading     multi_ thread  NOT Supported     Usage   multi_thread  lt N gt   Default  O0    Specifies that the program should be used in multithreading mode  N specifies the number of  processors available     EXCL  t coffee sample seq2 fasta  multi thread 4  If you are using a quadriprocessor  Limits   mem mode    Usage  deprecated     ulimit  Usage   ulimit  lt value gt   Default   ulimit 0    Specifies the upper limit of memory usage  in Megabytes   Processes exceeding this limit  will automatically exit  A value O indicates that no limit applies      maxlen    Usage   maxlen  lt value  0 nolimit gt   Default   maxlen 1000    Indicates the maximum length of the sequences     Aligning more than 100 sequences with DPA     maxnseq    Usage   maxnseq  lt value  0 nolimit gt   Default   maxnseq 50    Indicates the maximum number of sequences before triggering 
40. efit of this facility     EXCL  t coffee 3d fasta  special mode 3dcoffee    Using this mode will cause T Coffee to automatically identify the target  corresponding to your sequence as indicated by an NCBI BLAST  T Coffee then  obtains the required PDB sequences from RCSB  However  if you are also using      template_file  the program will use the template you specified and the  corresponding files on your disk     All these network based operations are carried out using wget  If wget is not  installed on your system  you can get it for free from  www wget org   To make sure  wget is installed on your system  type    EXCL  which wget    Identifying Occurrences of a Motif  Mocca    How    Mocca is a special mode of T Coffee that allows you to extract a series of repeats  from a single sequence or a set of sequences  In other words  if you know the  coordinates of one copy of a repeat  you can extract all the other occurrences  If you  want to use Mocca  simply type     EXCL  t coffee  other pg mocca sample seql fasta    The program needs some time to compute a library and it will then prompt you with  an interactive menu  Follow the instructions     Does T Coffee works    If you only want to make a standard multiple alignments  you may skip these  explanations  But if you want to do more sophisticated things  these few indications  may help before you start reading the doc and the papers     When you run T Coffee  the first thing it does is to compute a library  The library is 
41. es will  see their weight set to 1        71    1 Sensitivity to sequence order  It is difficult to implement a MSA algorithm totally  insensitive to the order of input of the sequences  In t_coffee  robustness is  increased by sorting the sequences alphabetically before aligning them  Beware that  this can result in confusing output where sequences with similar name are  unexpectedly close to one another in the final alignment     2 Nucleotides sequences with long stretches of Ns will cause problems to lalign   especially when using Mocca  To avoid any problem  filter out these nucleotides  before running mocca     3 Stop codons are sometimes coded with        in protein sequences  This will cause  the program to crash or hang  Please replace the        signs with an X     4 Results can differ from one architecture to another  due rounding differences  This  is caused by the tree estimation procedcure  If you want to make sure an alignment  is reproducible  you should keep the associated dendrogram        72    These notes are only meant for internal development     Development    The following examples are only meant for internal development  and are used to  insure stability from release to release    PROFILE2LIST    prf1  profile containing one structure    prf2  profile containing one structure    EXCL  t coffee Rsample profilel aln Rsample profile2 aln    special mode 3dcoffee  outfile aligned prf aln    Command Line List    These command lines have been checked before 
42. every release  along with the other  CL in this documentation    external methods     EXCL  t coffee sample seql fasta    in Mclustalw pair Mclustalw msa Mslow pair  outfile clustal text     fugue client    EXCL  t coffee  in Ssample seq5 fasta Pstruc4 pdb Mfugue pair       73     implement UPGMA tree computation    implement seq2dpa_tree    debug dpa    Reconciliate sequences and template when reading the template     Add the server command lines to the checking procedure       74    
43. fault  T Coffee does not require any other package than those included in the  distribution  However  depending on your needs  you may want to install some of  the following                    Package FACE LOM  ClustalW can interact with t coffee  wget 3DCoffee    Automatic Downloading of Structures  Remote use of the Fugue server          sap structure structure comparisons   obtain it from W  Taylor  NIMR MRC         Blast www ncbi nih nlm gov             Once the package is installed  make sure make sure that the executable is on your  path  so that t  coffee ca find it automatically           16     IMPORTANT  All the files mentionned here  sampe_seq     can be found in the example       directory of the distribution     T COFFEE    Write your sequences in the same file  Swiss prot  Fasta or Pir  and type   EXCL  t coffee sample seql fasta  This will output two files     sample seql aln  your Multiple Sequence Alignment    sample seql dnd  The Guide tree  newick Format        IMPORTANT  If you are trying to align Nucleic Acid  use  special mode dna     EXCL  t coffee  in sample dnaseql fasta  special mode dna    MOCCA    Write your sequences in the same file  Swiss prot  Fasta or Pir  and type   EXCL  t coffee  other pg mocca sample seql fasta    This command output one files    your sequences gt  mocca lib  and starts an  interactive menu        17     Use of   as a separator when specifying methods parameters     The most notable modifications have to do with the struct
44. ffee  in Ssample seql fasta Lsample seql tc lib  outfile  sample seql aln    If you only want some of these residues to be aligned  or want to give them  individual weights  you will have to edit the library file yourself or use the      force aln option  cf FAQ  I would like to force some residues to be aligned   A  value of N N   1000  N being the number of sequences  usually ensure the respect  of a constraint     Q  I want to use my own tree  A  Use the  usetree  lt your own tree   flag     EXCL  t coffee sample seql fasta  usetree sample tree dnd    Q  I want to align coding DNA    A  use the fasta cdna pair method that compares two cDNA using the best reading  frame and taking frameshifts into account     EXCL  t coffee sample seq4 fasta  in Mcdna fast pair  Notice that in the resulting alignments  all the gaps are of modulo3  except one  small gap in the first line of sequence hmgl trybr  This is a framshift  made on  purpose  You can realign the same sequences while ignoring their coding potential  and treating them like standard DNA     EXCL  t coffee sample seq4 fasta    Note  This method has not yet been fully tested and is only provided    as is    with no warranty     Q  I do not want to use all the possible pairs when  computing the library    Q  I only want to use specific pairs to compute the       34    library    A  Simply write in a file the list of sequence groups you want to use     EXCL  t coffee sample seql fasta  in Mclustalw pair Mclustalw msa   lib l
45. handling of  names is consistent with Clustalw   Cf Sequence Name Handling in the Format  section   If your dataset contains sequences with identical names  these will  automatically be renamed to           kckckckckckckckckckckckckckckckck ck ckck ck ck ck ck     gt seql    gt seql  KKKKKKKKKKKKKKKKKEKKEKKKKK   gt seql    gt seql_1    XXKKKKKAKKKKKKKKKKKKKAXAk        Warning  The behaviour is undefined when this creates two sequence with a similar names          37       This reference manual gives a list of all the flags that can be used to modify the  behavior of T Coffee  For your convenience  we have grouped them according to  their nature  To display a list of all the flags used in the version of T Coffee you are  using  along with their default value   type     EXCL  t coffee    Or    EXCL  t coffee  help    Or    EXCL  t coffee  help  in    Or any other parameter    Well Behaved Parameters    Separation    You can use any kind of separator you want  i e      lt space gt     The syntax used in  this document is meant to be consistent with that of ClustalW  However  in order to  take advantage of the automatic filename compleation provided by many shells  you  can replace         and         with a space     Posix  T Coffee is not POSIX compliant     Entering the right parameters    There are many ways to enter parameters in T Coffee  see the  parameter flag in          38       Parameters Syntax    No Flag    If no flag is used  lt your sequence gt  must be the first a
46. ights can be output using the  outseqweight flag     Note  You can use your own weights  see the format section      Multiple Alignment Computation     msa mode    Usage   msa_mode  lt tree graph precomputed gt   Default   evaluate mode tree    Unsupported     profile comparison  Usage   profile_mode  lt fullN profile gt   Default   profile mode full50    The profile mode flag controls the multiple profile alignments in T Coffee  There are two  instances where t  coffee can make multiple profile alignments     1 When N  the number of sequences is higher than    maxnseq  the program switches to its  multiple profile alignment mode  t  coffee dpa      2 When MSAs are provided via the    profile flag or via    profilel and    profile2     In these situations  the    profile mode value influences the alignment computation  these  values are         profile comparison  profile  the MSAs provided via  profile are vectorized and the  function specified by    profile comparison is used to make profile profile alignments  In that  case  the complexity is NL 2     profile comparison fullN  N is an integer value that can omitted  Full indicates that  given two profiles  the alignment will be based on a library that includes every possible pair  of sequences between the two profiles  If N is set  then the library will be restricted to the N       53    most similar pairs of sequences between the two profiles  as judged from a measure made on  a pairwise alignment of these two profiles   
47. ignments can be  viewed as collections of constraints that must be fit within the final alignment  Of  course  the constraints do not have to agree with one another       This section shows you what are the vailable method in T Coffee  and how you can  add your own methods  either through direct parameterization or via a perl script     Using Methods Integrated in T Coffee    Some packages already have an interface with t_coffee  these include     align_pdb   sap   lalign2list     clustalw     ALIGN_PDB_4_TCOFFEE  SAP_4_TCOFFEE  LALIGN_4_TCOFFEE  CLUSTALW_4_TCOFFEE    If these programs are installed on your system and you want t_coffee to use a    specific version     setenv CLUSTALW 4 TCOFFEE   path to your version gt     LIST OF INTERNAL METHODS    Built in methods methods can be requested using the following names     fast pair    slow pair    ifast pair  islow pair    Makes a global fasta style pairwise alignment  For proteins   matrix blosum62mt  gep  1  gop  10  ktup 2  For DNA   matrix idmat  id 10   gep  1  gop  20  ktup 5  Each pair of  residue is given a score function of the weighting mode defined  by weight     Identical to fast pair  but does a full dynamic programming   using the myers and miller algorithm  This method is  recommended if your sequences are distantly related     Makes a global fasta alignmnet using the previously computed  pairs as a library   i  stands for iterative  Each pair of residue is  given a score function of the weighting mode defined b
48. ion should be provided  For instance     EXCL  t coffee sample seql fasta  in Xpam250mt   gapopen  10    gapext  1    This command results in a progressive alignment carried out on the sequences in seqfile  The  procedure is very similar to Pileup  In this context  appropriate gap penalties should be  provided  The matrices are in the file source matrices h  Add Hoc matrices can also be  provided by the user  see the matrices format section at the end of this manual      Profile Input     profile  Usage   profile   lt name gt    maximum of 200 profiles        45    Default  no default    This flag causes T Coffee to treat multiple alignments as a single sequences  thus making it  possible to make multiple profile alignments  The profile profile alignment is controlled by    profile_mode and  profile_comparison  When provided with the  in flag  profiles must be  preceded with the letter R     EXCL  t coffee  profile sample alnl aln sample aln2 aln    outfile profile aln    EXCL  t coffee  in  Rsample alnl aln Rsample aln2 aln Mslow pair Mlalign id pair    outfile profile aln    Note that when using  template file  the program will also look for the templates associated  with the profiles  even if the profiles have been provided as templates themselves  however  it will not look for the template of the profile templates of the profile templates         profilel  cw     Usage   profilel   lt name gt    one name only  Default  no default    Similar to the previous one and was pro
49. ist sample listl lib list  outfile test          SAS O EEES ES wow  2 hmgl_trybr hmgt_mouse   2 hmgl_trybr hmgb_chite   2 hmgl trybr hmgl wheat   3 hmgl trybr hmgl wheat hmgl mouse  SS am SAS A OATES pi       Note  Pairwise methods  slow_pair     will only be applied to list of pairs of sequences   while multiple methods  clustalw_aln  will be applied to any dataset having more than  two sequences     Q  There are duplicates or quasi duplicates in my set    A  If you can remove them  this will make the program run faster  otherwise  the  t_coffee scoring scheme should be able to avoid over weighting of over represented  sequences     Using Structures and Profiles    Q  Can I align sequences to a profile with T Coffee     A  Yes  you simply need to indicate that your alignment is a profile with the R tag      EXCL  t coffee sample seql fasta Rsample aln2 aln  outfile tacos    Q  Can I align sequences Two or More Profiles     A  Yes  you  simply tag your profiles with the letter R and the program will treat  them like standard sequences     EXCL  t coffee Rsample alnl fasta Rsample aln2 aln  outfile tacos    Q  Can I align two profiles according to the structures  they contain     A  Yes  As long as the structure sequences are named according to their PDB  identifier    EXCL  t coffee Rsample profilel aln Rsample profile2 aln    special mode 3dcoffee  outfile aligne prf aln    Q  T Coffee becomes very slow when combining    sequences and structures    A  This is true  By defaul
50. ith the    R    identifier     EXCL  t coffee Ssample seql fasta Rsample aln2 aln  outfile  seqprofile aln    All the internal methods should support profiles  External methods do not support  profiles  unless specified otherwise      Using Structures  Or templates  Within Profiles    If your profiles contain structures  you can make sure these will be used during the  computatiuon by specifying the 3dcoffee special mode     EXCL  t coffee Rsample profilel aln Rsample profile2 aln    special mode 3dcoffee  outfile aligned prf aln    Note that when providing a collection of templates  the program will use the      template file flag to look for templates within the sequences AND within the  profiles associated with some sequences     Using New and Existing Methods    Although  it does not necessarily do so explicitly  T Coffee always end up  combining libraries  Libraries are collections of pairs of residues  Given a set of       20    libraries  T Coffee makes an attempt to assemble the alignment with the highest  level of consistence  You can think of the alignment as a timetable  Each library pair  would be a request from students or teachers  and the job of T Coffee would be to  assemble the time table that makes as many people as possible happy       In T Coffee  methods replace the students professors as constraints generators   These methods can be any standard non standard alignment methods that can be  used to generate alignments  pairwise  most of the time   These al
51. late    The type of template on which a method works is declared with the SEQ TYPE parameter  in the method configuration file     SEQ TYPE S  a method that uses sequences  SEQ TYPE PS  a pairwise method that aligns sequences and structures  SEQ TYPE P  a method that aligns structures  sap for instance   There are 3 tags identifying the template type   P Structural templates  a pdb identifier OR a pdb file    _G_ Genomic templates  a protein sequence where boundary amino acid have  been recoded with   0 0  i 1  j 2   R Profile Templates  a file containing a multiple sequence alignment    More than one template file can be provided  There is no need to have one template for  every sequence in the dataset     P  _G_  and R are known as template TAGS    2 SCRIPT  lt scriptname gt     Indicates that filename is a script that will be used to generate a valid template file  The  script will run on a file containing all your sequences using the following syntax     scriptname  infile   your sequences       outfile   template file      It is also possible to pass some parameters  use   as a separator  i e  the   will be turned  into a space         58    SCRIPT _myscriptl vall val2     3 SELF TAG    original name of the sequence will be used to fetch the template     EXCL  t coffee 3d sample2 fasta  template file SELF P     The previous command will work because the sequences in 3d  sample3 are named    4 SEOFILE TAG filename    Use this flag if your templates are in filename  and ar
52. ly carried out on pairs having a weight value superior to the  specified limit      extend_mode    Usage   extend  lt string gt   Default  extend very fast triplet    Warning  Development Only    Controls the algorithm for matrix extension  Available modes include     relative triplet Unsupported   g coffee Unsupported      coffee quadruplets   Unsupported   fast triplet Fast triplet extension   very  fast triplet slow triplet extension  limited to the  max n pair best    sequence pairs when aligning two profiles    slow triplet Exhaustive use of all the triplets  mixt Unsupported  quadruplet Unsupported  test Unsupported       47    matrix Use of the matrix  matrix    fast_matrix Use of the matrix  matrix  Profiles are turned into consensus     max_n_pair  Usage   max_n_pair  lt integer gt   Default   extend 10    Development Only    Controls the number of pairs considered by the  extend_mode very_fast_triplet  Setting it  to O forces all the pairs to be considered  equivalent to  extend_mode slow_triplet       seq name for quadruplet  Usage  Unsupported     compact  Usage  Unsupported     clean  Usage  Unsupported     maximise  Usage  Unsupported     do self  Usage  Flag  do self  Default  No    This flag causes the extension to carried out within the sequences  as opposed to between  sequences   This is necessary when looking for internal repeats with Mocca      seq name for quadruplet  Usage  Unsupported     weight  Usage   weight  lt winsimN  sim or sim   matrix name or 
53. n_id_pair to the set of sequences previously  obtained  1   The final library used for the alignment will be the combination of all this  information     Note as well the following rules     1 Order  The order in which sequences  methods  alignments and libraries are fed in is  irrelevant     2 Heterogeneity  There is no need for each element  A  S  L  to contain the same sequences     3 No Duplicate  Each file should contain only one copy of each sequence  Duplicates are  only allowed in FASTA files but will cause the sequences to be renamed     4 Reconciliation  If two files  for instance two alignments  contain different versions of the  same sequence due to an indel  a new sequence will be reconstructed and used instead     aln 1 hgabl AAAAABAAAAA    aln 2 hgabl AAAAAAAAAACCC    will cause the program to reconstruct and use the following sequence    hgabl AAAAABAAAAACCC    This can be useful if you are trying to combine several runs of blast  or structural  information where residues may have been deleted  However substitutions are forbidden  If  two sequences with the same name cannot be merged  they will cause the program to exit  with an information message     5 Methods  The method describer can either be built in  See     for a list of all the  available methods  or be a file describing the method to be used  The exact syntax is  provided in part 4 of this manual     6 Substitution Matrices  If the method is a substitution matrix  X  then no other type of    informat
54. nce alignments    J Mol Biol  2004 Jul 2 340 2  385 95    PMID  15201059  PubMed   indexed for MEDLINE     Poirot O  O Toole E  Related Articles Links  Notredame C           Tcoffee  igs  A web server for computing  evaluating and  combining multiple sequence alignments    Nucleic Acids Res  2003 Jul 1 31 13  3503 6    PMID  12824354  PubMed   indexed for MEDLINE     Notredame C  Related Articles  Links       Mocca  semi automatic method for domain hunting   Bioinformatics  2001 Apr 17 4  373 4   PMID  11301309  PubMed   indexed for MEDLINE     Notredame C  Higgins DG  Related Articles Links  Heringa J           T Coffee  A novel method for fast and accurate multiple sequence  alignment    J Mol Biol  2000 Sep 8 302 1  205 17    PMID  10964570  PubMed   indexed for MEDLINE     Notredame C  Holm L  Related Articles Links  Higgins DG           COFFEE  an objective function for multiple sequence alignments   Bioinformatics  1998 Jun 14 5  407 22   PMID  9682054  PubMed   indexed for MEDLINE     Mocca    Notredame C  Related Articles Links       Mocca  semi automatic method for domain hunting        Bioinformatics  2001 Apr 17 4  373 4   PMID  11301309  PubMed   indexed for MEDLINE     CORE    http   igs server cnrs mrs fr  cnotred Publications Pdf core pp pdf       Other Contributions    Some pieces of code from other packages have been incorporated within the T   Coffee package  These include      The Sim algorithm of Huang and Miller that given two sequences computes  the N be
55. non character  signs are ignored in the sequence field  such as numbers  annotation         Note  a different number of lines in the different blocks will cause the program to crash  or hang        68       Libraries  T COFFEE LIB FORMAT 01    This is currently the only supported format              space   TC LIB FORMAT 01    lt nseq gt     lt seql name    lt seql length    lt seql gt    lt seq2 name    lt seq2 length    lt seq2 gt     seq3 name     seq3 length    lt seq3 gt    Comment    Comment n   Sud S227   REL REZ Wi  W2  VS    i2   12 13 99  12 0 vs 15 1  weiche 99   12 14 70   Ss 1  56  LS   12 13 99  12 14 Y   LS 16 56     space  SEQ 1 TO N                      Sil  index of Sequence 1   Ril  index of residue 1 in seql   V1  Integer Value  Weight   V2  V3  optional values   Note 1  There is a space between the   And SEQ 1 TO N    Note 2  The last line    SEQ 1 TO N  indicates that     Sequences and residues are numbered from 1 to N  unless the token SEQ 1 TO N  is omitted  in which case the sequences are numbered from O to N 1  and residues  are from 1 to N     Residues do not need to be sorted  and neither do the sequences  The same pair can  appear several times in the library  For instance  the following file would be legal         0 1  12 ils  99   0 2  LS 16 99   0 1  12  14 79       T COFFEE LIB FORMAT 02    A simpler format is being developed  however it is not yet fully supported and is  only mentioned here for development purpose         j me guae  RE ORINA 
56. o run the  EXCL  t coffee  other pg unpack clustalw method tc method    EXCL  t coffee  other pg unpack generic method tc method    The second file  generic method tc method  contains many hints on how to  customize your new method  The first file is a very straightforward example on how  to have t coffee to run Clustalw with a set of parameters you may be interested in            TC METHOD FORMAT 01          23               k k k  k   clustalw method tc method           EXECUTABLE clustalw                               ALN_MODE pairwise  IN_FLAG  INFILE   OUT FLAG  OUTFILE   OUT MODE aln   PARAM  gapopen  10  SEOMIVPE S    kk ckckck ck ckckckckckckckckckckock ck ck ck ck ck ck ck ck ck ck ck ck KR kk ck kk ck ckckckckckckckck XX KEK       This configuration file will cause T Coffee to emit the following system call     clustalw  INFILE tmpfilel  OUTFILE tmpfile2  gapopen  10    Note that ALN MODE instructs t  coffee to run clustalw on every pair of sequences   cf generic  method tc method for more details      The tc method files are treated like any standard established method in T Coffee   For instance  if the file c ustalw  method tc method is in your current directory  run     EXCL  t coffee sample seql fasta  in Mclustalw method tc method    Managing a collection of method files    It may be convenient to store all the method files in a single location on your  system  By default  t coffee will go looking into the directory    t_coffee methods    You can change this by
57. pair  the current default      EXCL  t coffee  in Ssample seql fasta Mfast pair Mlalign id pair  MODFYING THE PARAMETERS  It is possible to modify on the fly the parameters of hard coding methods     EXCL  t coffee sample seql fasta  in  slow_pair EP MATRIX pam250mt GOPG  10 GEP  1    EP stands for Extra parameters  These parameters will superseed any other  parameters     Integrating External Methods  DIRECT ACCESS TO EXTERNAL METHODS    A special method exists in T Coffee that can be used to invoke any existing  program     EXCL  t coffee sample seql fasta  in Mem clustalw pairwise    In this context  Clustalw is a method that can be ran with the following command  line     method     infile  lt infile gt   outfile  lt outfile gt     Clustalw can be replaced with any method using a similar syntax  If the program  you want to use cannot be run this way  you can either write a perl wrapper that fits  the bill or write a tc_method file adapted to your program  cf next section      This special method  em  external method  uses the following syntax     Em  lt method gt    lt aln_mode pairwise s pairwise multiple gt     CUSTOMIZING AN EXTERNAL METHOD  WITH  PARAMETERS  FOR T COFFEE    T Coffee can run external methods  using a tc method file that can be used in place  of an established method  Two such files are incorporated in T Coffee  You can  dump them and customize them according to your needs     For instance if you have ClustalW installed  you can use the following file t
58. pensive   structural alignment  or network intensive  BLAST search  operations        41     update    Usage   update  Default  turned off    Causes a wget access that checks whether the t coffee version you are using needs updating      full log  Usage   full_log  lt filename gt   Default  turned off    Causes t  coffee to output a full log file that contains all the input output files      other pg    Usage   other_pg  lt filename gt   Default  turned off    Some rumours claim that Tetris is embedded within T Coffee and could be ran using some  special set of commands  We wish to deny these rumours  although we may admit that  several interesting reformatting programs are now embedded in t coffee and can be ran  through the    other pg flag     EXCL  t coffee  other pg seq reformat  EXCL  t coffee  other pg unpack all    EXCL  t coffee  other pg unpack extract from pdb    Input  Sequence Input     infile  cw     To remain compatible with ClustalW  it is possible to indicate the sequences with this flag    EXCL  t coffee  infile sample seql fasta    Note  Common multiple sequence alignments format constitute a valid input format     Note  T Coffee automatically removes the gaps before doing the alignment  This  behaviour is different from that of ClustalW where the gaps are kept      in  Cf  in from the Method and Library Input section    get type    Usage   get type  Default  turned off    Forces t  coffee to identify the sequences type  PROTEIN  DNA         42     type  cw
59. quences may be edited  when coming out of the program  Five rules apply           Naming Your Sequences the Right Way  1 No Space  Names that do contain spaces  for instance     gt seql human myc  will be turned into    gt seql  It is your responsibility to make sure that the names you provide are not ambiguous  after such an editing  This editing is consistent with Clustalw  Version 1 73     2 No Strange Character   Some non alphabetical characters are replaced with underscores  These are          Other characters are legal and will be kept unchanged  This editing is meant to keep  in line with Clustalw  Version 1 75      3    is NEVER legal  except as a header token in a FASTA file     4 Name length must be below 100 characters  although 15 is recommended for  compatibility with other programs    5 Duplicated sequences will be renamed  i e  sequences with the same name in the  same dataset  are allowed but will be renamed according to their original order   When sequences come from multiple sources via the    in flag  consistency of the          67    renaming is not guaranteed  You should avoid duplicated sequences as they will  cause your input to differ from your output thus making it difficult to track data        Automatic Format Recognition    Most common formats are automatically recognized by t_coffee  See  in and the  next section for more details  If your format is not recognized  use readseq or  clustalw to switch to another format  We recommend Fasta     Structu
60. r wise  implementation of the Myers and Miller dynamic programming  algorithm   quadratic in time and linear in space   This algorithm is recommended for very  long sequences  It is about 2 times slower than gotoh and only accepts tg mode  lor 2  i e   gaps penalized for opening      fasta pair wise  implementation of the fasta algorithm  The sequence is hashed  looking for  ktuples words  Dynamic programming is only carried out on the ndiag best scoring  diagonals  This is much faster but less accurate than the two previous  This mode is  controlled by the parameters  ktuple   diag mode and  ndiag    cfasta pair wise  c stands for checked  It is the same algorithm  The dynamic programming  is made on the ndiag best diagonals  and then on the 2 ndiags  and so on until the scores  converge  Complexity will depend on the level of divergence of the sequences  but will  usually be L log L   with an accuracy comparable to the two first mode   this was checked  on BaliBase   This mode is controlled by the parameters  ktuple   diag mode and    ndiag    Note  Users may find by looking into the code that other modes with fancy names exists   viterby pair wise     Unless mentioned in this documentation  these modes are not  supported      ktuple    Usage   ktuple  lt value gt   Default   ktuple 1 or 2    Indicates the ktuple size for cfasta pair wise dp mode and fasta pair wise  It is set to 1 for  proteins  and 2 for DNA  The alphabet used for protein can be a degenerated version  set
61. rating colored version of the output  with the    output  flag     EXCL  t coffee sample seql fasta  evaluate mode t coffee slow      output score ascii  score html    EXCL  t coffee sample seql fasta  evaluate mode t coffee fast           64    output score ascii  score html    EXCL  t coffee sample seql fasta  evaluate mode  t coffee non extended  output score ascii  score html    Generic Output     run name    Usage   run_name  lt your run name gt   Default  no default set    This flag causes the prefix  lt your sequences gt  to be replaced by  lt your run name gt   when renaming the default output files      quiet  Usage   quiet  lt stderr stdout file name OR nothing gt    Default  quiet stderr    Redirects the standard output to either a file   quiet on its own redirect the output to   dev null      align  CW   This flag indicates that the program must produce the alignment  It is here for  compatibility with ClustalW        65    We maintain a T Coffee server   igs server cnrs mrs fr Tcoffee    We will be  pleased to provide anyone who wants to set up a similar service with the sources    Common Problems when setting up servers  CACHE Directory    T Coffee needs a cache directory where it stores temporary files  caches alignments  and all sort of other messy things  For a normal user  this cache is ususally build in   HOME  t_coffee     Yet in the case of a server  such permissions may not be  availale  You can therefore redirect the cache by setting the environement va
62. re  We recently introduced a new mode that  makes T Coffee able to accurately align large datasets     How can l use it     T Coffee is not an interactive program  It runs from your UNIX or Linux command  line and you must provide it with the correct parameters  If you do not like typing  commands  here is the simplest available mode where T Coffee only needs the  name of the sequence file     EXCL  t coffee sample seql fasta    Installing and using T Coffee requires a minimum acquaintance with the  Linux Unix operating system  If you feel this is beyond your computer skills  we  suggest you use one of the available online servers        Is T Coffee different from ClustalW     According to several benchmarks  T Coffee appears to be more accurate than  ClustalW  Yet  this increased accuracy comes at a price  T Coffee is slower than  Clustal  about N times      If you are familiar with ClustalW  or if you run a ClustalW server  you will find that  we have made some efforts to ensure as much compatibility as possible between  ClustalW and T COFFEE  Whenever it was relevant  we have kept the flag name  and the flag syntax of ClustalW  Yet  you will find that T Coffee also has many  extra possibilities       If you want to align closely related sequences  T Coffee can also be used in a fast  mode  much faster than ClustalW  and about as accurate   T Coffee  very  fast  This  mode is especially useful to align long sequences     What T Coffee Can and Cannot do for you        IMPORT
63. res    PDB format is recognized by T Coffee  T Coffee uses extract_from_pdb  cf      other_pg flag   extract_from_pdb is a small embeded module that can be used on its  own to extract information from pdb files     Sequences    Sequences can come in the following formats  fasta  pir  swiss prot  clustal aln  msf  aln and t_coffee aln  These formats are the one automatically recognized  Please  replace the     sign sometimes used for stop codons with an X     Alignments    Alignments can come in the following formats  msf  ClustalW  Fasta  Pir and  t_coffee  The t_coffee format is very similar to the ClustalW format  but slightly  more flexible  Any interleaved format with sequence name on each line will be  correctly parsed            lt empy line gt   Facultative n     line of text    Required      line of text    Facultative n    empty line    Required     empty line    Facultative n     lt seql name gt  lt space gt  lt seql gt     lt seq2 name gt  lt space gt  lt seq2 gt     lt seq3 name gt  lt space gt  lt seq3 gt     lt empty line gt   Required    lt empty line gt   Facultative n   lt seql name gt  lt space gt  lt seql gt     lt seq2 name gt  lt space gt  lt seq2 gt     lt seq3 name gt  lt space gt  lt seq3 gt     lt empty line gt   Required    lt empty line gt   Facultative n       An empty line is a line that does NOT contain amino acid  A line that contains the  ClustalW annotation       is empty     Spaces are forbidden in the name  When the alignment is being read  
64. rgument  See format for further  information     EXCL  t coffee sample seql fasta  Which is equivalent to  EXCL  t coffee Ssample seql fasta  When you do so  sample seq1 is used as a name prefix for every file the program outputs      parameters    Usage   parameters parameters file  Default  no parameters file    Indicates a file containing extra parameters  Parameters read this way behave as if they had  been added on the right end of the command line that they either supersede one value  parameter  or complete  list of values   For instance  the following file  parameter file  could  be used       asen le ona meii rr EO   in Ssample seql fasta Mfast pair   output msf aln   KREEKEKKEKEARRAKRRK CK KK UKCK KRK KKK AKA SEO RARA       Note  This is one of the exceptions  with    infile  where the identifier tag  S A L M      can be omitted  Any dataset provided this way will be assumed to be a sequence  S    These exceptions have been designed to keep the program compatible with ClustalW        39    Note  This parameter file can ONLY contain valid parameters  Comments are not  allowed  Parameters passed this way will be checked like normal parameters     Used with   EXCL  t coffee  parameters sample param file param    Will cause t coffee to apply the fast pair method onto to the sequences contained in  sample seq fasta  If you wish  you can also pipe these arguments into t coffee  by naming  the parameter file  stdin   as a rule  any file named stdin is expected to receive it
65. riable  CACHE 4 TCOFFEE to some mpore suitable value in your scratch area     Output of the  dnd file     A common source of error when running a server  T Coffee MUST output the  dnd  file because it re reads it to carry out the progressive alignment  By default T Coffee  outputs this file in the directory where the process is running  If the T Coffee  process does not have permission to write in that directory  the computation will  abort       To avoid this  simply specify the name of the output tree    newtree  lt writable file  usually in  tmp  gt     Chose the name so that two processes may not over write each other dnd file     Permissions    The t  coffee process MUST be allowed to write in some scratch area  even when it  is ran by Mr nobody    Make sure the  tmp  partition is not protected     Other Programs    T Coffee may call various programs while it runs  lalign2list by defaults   Make  sure your process knows where to find these executables        66    Parameter files    Parameter files used with  parameters   t_coffee_defaults   dali_defaults    Must  contain a valid parameter string where line breaks are allowed  These files cannot  contain any comment  the recommended format is one parameter per line         lt parameter name gt   lt valuel gt   lt value2 gt        lt parameter name             Sequence Name Handling    Sequence name handling is meant to be fully consistent with ClustalW  Version  1 75   This implies that in some cases the names of your se
66. rned into a library where matched nucleotides receive a score  equql to the average level of identity at the amino acid level   This mode is intended to clean cDNA obtained from ESTs  or to  align pseudo genes     LIST OF EXTERNAL METHODS    The following methods are external  They correspond to packages developped by  other groups that you may want to run within T Coffee  We are very open to  extending these options and we welcome any request to ad an extra interface     clustalw pair   Uses clustalw  default parameters  to align two sequences  Each  pair of residue is given a score function of the weighting mode  defined by  weight     clustalw msa Makes a multiple alignment using ClustalW and adds it to the  library  Each pair of residue is given a score function of the  weighting mode defined by  weight     probcons pair Probcons package  install the latest version from   http   probcons stanford edw     probcons msa idem     muscle pair Muscle package  install the latest version from   http   www drive5 com muscle       muscle msa idem     sap_pair Uses sap to align two structures  Each pair of residue is given a  score function defined by sap  You must have sap installed on  your system to use this method     fugue pair Uses fugue to align a structure and a sequence Fugue does not  need to be installed the call is made through wget   Unsupported        22    To request a method  see the  in flag  For instance  if you wish to request the use of  fast_pair and lalign_id_ 
67. roduced with T   Coffee     A  Yes  You may have an alignment produced from any source you like  To  evaluate it do     EXCL  t coffee  infile sample alnl aln  in Lsample alnl tc lib    special mode evaluate    If you have no library available  the library will be computed on the fly using the  following command  This can take some time  depending on your sample size  To  monitor the progress in a situation where the default library is being built  use     EXCL  t coffee  infile sample alnl aln  special mode evaluate    Q  Can I Compare Two Alignments     A  Yes  You can treat one of your alignments as a library and compare it with the  second alignment        36    EXCL  t coffee  infile sample aln1 1 aln  in Asample aln1 2 aln    special mode evaluate    If you have no library available  the library will be computed on the fly using the  following command  This can take some time  depending on your sample size  To  monitor the progress in a situation where the default library is being built  use     EXCL  t coffee  infile sample alnl aln  special mode evaluate    Q  I am aligning sequences with long regions of very  good overlap    A  Increase the ktuple size   up to 4 or 5 for DNA  and up to 3 for proteins     EXCL  t coffee sample seql fasta  ktuple 3    This will speed up the program  It can be very useful  especially when aligning  ESTs     Q  Why is T Coffee changing the names of my  sequences lll    A  If there is no duplicated name in your sequence set  T Coffee   s 
68. s content  via the stdin     cat sample param file param   t coffee  parameters stdin      coffee defaults    Usage   t coffee defaults  lt file name gt   Default  not used     This flag tells the program to use some default parameter file for t coffee  The format of that  file is the same as the one used with  parameters  The file used is either     1    file name   if a name has been specified  2   4 t coffee defaults if no file was specified    3  The file indicated by the environment variable TCOFFEE DEFAULTS     special mode    Usage   special mode  hard coded mode  Default  not used     It indicates that t coffee will use some hard coded parameters  These include   quickaln  very fast approximate alignment  dali  a mode used to combine dali pairwise alignments  evaluate  defaults for evaluating an alignment  3dcoffee  runs t coffee with the 3dcoffee parameterization    dna  runs t coffee with appropriate parameters    Other modes exist that are not yet fully supported     score  Deprecated     Usage   score  Default  not used    Toggles on the evaluate mode and causes t coffee to evaluates a precomputed alignment  provided via  infile  lt alignment gt   The flag  output must be set to an appropriate format   i e   output score_ascii  score html or score pdf   A better default parameterization is  obtained when using the flag  special mode evaluate      evaluate    Usage   evaluate       40    Default  not used    Replaces    score  This flag toggles on the evaluate mode
69. s parameters to t coffee        sse trennen nentes 30  O  How can I change the default output format  sin 30  O  My sequences are slightly different between all the alignment                               O  Is it possible to pipe stuff OUT of t coffee  inner 30  O  Is it possible to pipe stuff INTO t_coffee                                                      eene ener 3l  O  Can I read my parameters from a file   O  I want to decide myself on the name of the output files  l           essent 3l  O  I want to use the sequences in an alignment file  O  I only want to produce a library              Exit 21250 0 rn                                                 O  Can t coffee align Nucleic Acids  2              sss  O  I do not want to compute the alignment                    sine  O  I would like to force some residues to be aligned  ss 33  O  I would like to use structural alignments  ooo  O  I want to build my own libraries               O  I wantto align Coding DNA  d re rese d RR EUR Ie pets  O  I do not want to use all the possible pairs when computing the library  O  I only want to use specific pairs to compute the library                     sss  O  There are duplicates or quasi duplicates in my set     Using Structures and Profiles                        1   Leere eee eee eeepc eren eerte seen setae sesta see en sete ease eene seen sese esse esas setas seen ases ese seen aae  O  Can I align sequences to a profile with T Coffee 2     O  Can I align sequences Two or Mor
70. sed by Phylips  see the format section       Do NOT confuse this guide tree with a phylogenetic tree     Reliability Estimation  CORE Computation    The CORE is an index that indicates the consistency between the library of piarwise  alignments and the final multiple alignment  Our experiment indicate that the higher  this consistency  the more reliable the alignment  A publication describing the  CORE index can be found on     http   igs server cnrs mrs fr  cnotred Publications Pdf core pp pdf        evaluate mode  Usage     evaluate mode  lt t coffee fast t coffee slow t coffee non extende  d gt     Default   evaluate_mode t_coffee_fast    This flag indicates the mode used to normalize the t_coffee score when computing the  reliability score     t coffee fast  Normalization is made using the highest score in the MSA  This evaluation  mode was validated and in our hands  pairs of residues with a score of 5 or higher have 90    chances to be correctly aligned to one another     t coffee slow  Normalization is made using the library  This usually results in lower score  and a scoring scheme more sensitive to the number of sequences in the dataset  Note that this  scoring scheme is not any more slower  thanks to the implementation of a faster heuristic  algorithm     t coffee non extended  the score of each residue is the ratio between the sum of its non  extended scores with the column and the sum of all its possible non extended scores     These modes will be useful when gene
71. st scoring local alignments      The tree reading computing routines are taken from the ClustalW Package   courtesy of Julie Thompson  Des Higgins and Toby Gibson  Thompson  Higgins   Gibson  1994  4673 4680  vol  22  Nucleic Acid Research       The implementation of the algorithm for aligning two sequences in linear  space was adapted from Myers and Miller  in CABIOS  1988  11 17  vol  1      Various techniques and algorithms have been implemented  Whenever  relevant  the source of the code algorithm idea is indicated in the corresponding  function      64 Bits compliance was implemented by Benjamin Sohn  Performance  Computing Center Stuttgart  HLRS   Germany     Prof David Jones  UCL  reported and corrected the PDBIK bug  now  t_coffee sap can align PDB sequences longer than 1000 AA         What is T Coffee     Before going deep into the core of the matter  here are a few words to quickly  explain some of the things T Coffee will do for you     What does it do     T Coffee is a multiple sequence alignment program  given a set of sequences  previously gathered using database search programs like BLAST  FASTA or Smith  and Waterman  T Coffee will produce a multiple sequence alignment  To use T   Coffee you must already have your sequences ready     What can it align     T Coffee will align DNA and protein sequences alike  although it does better at  aligning proteins than nucleic acids  It will be able to use structural information for  protein sequences with a known structu
72. supersedes the use of the    convert flag  Its main       31    advantage is to restrict computation time to the actual library computation     Q  I want to turn an alignment into a library  A  use the    lib_only flag    EXCL  t coffee  in Asample alnl aln  out lib sample libl tc lib    lib only    It is also possible to control the weight associated with this alignment  see the      weight section      EXCL  t coffee  in Asample alnl aln  out lib sample libl tc lib    lib only  weight 1000    Q  I want to concatenate two libraries    A  You cannot concatenate these files on their own  You will have to use t coffee   Assume you want to combine tc libl tc lib and tc lib2 tc lib     EXCL  t coffee  in Lsample libl tc lib Lsample lib2 tc lib    lib only  out lib sample lib3 tc lib    Q  What happens to the gaps when an alignment is fed  to T Coffee    A  An alignment is ALWAYS considered as a library AND a set of sequences  If  you want your alignment to be considered as a library only  use the S identifier     EXCL  t coffee Ssample alnl aln  outfile outaln    It will be seen as a sequence file  even if it has an alignment format  gaps will be  removed      Q  I cannot print the html graphic display  l    A  This is a problem that has to do with your browser  Instead of requesting the  score html output  request the score ps output that can be read using ghostview     EXCL  t coffee sample seql fasta  output score ps  or  EXCL  t_coffee sample seq2 fasta  output score pdf    Q
73. t coffee sample seql fasta  output clustalw gcg  score html    A publication describing the CORE index is available on   http   igs server cnrs mrs fr  cnotred Publications Pdf core pp pdf        outseqweight    Usage   outseqweight  lt filename gt     Default  not used    Indicates the name of the file in which the sequences weights should be saved       Case    Usage   case  lt keep upper lower gt   Default   case keep    Instructs the program on the case to be used in the output file  Clustalw uses upper  case   The default keeps the case and makes it possible to maintain a mixture of  upper and lower case residues     If you need to change the case of your file  you can use seq_reformat     EXCL  t coffee  other pg seq reformat  in sample alnl aln  action   lower  output clustalw       62     cpu    Usage  deprecated     outseqweight    Usage   outseqweight  lt name of the file containing the weights applied    Default   outseqweight no    Will cause the program to output the weights associated with every sequence in the  dataset      outorder  cw     Usage   outorder  lt input OR aligned OR filename gt     Default  outorder input    Sets the order of the sequences in the output alignment   outorder input means the  sequences are kept in the original order   outorder aligned means the sequences come in  the order indicated by the tree  This order can be seen as a one dimensional projection of the  tree distances     outdorder  lt filename gt Filename is a legal fasta file
74. t the structures are feteched on the net  using RCSB  The  problem arises when T Coffee looks for the structure of sequences WITHOUT       35       structures  One solution is to install PDB locally  In that case you will need to set  two environement variables     setenv PDB DIR  directory containing the pdb structures   Setenv NO REMOTE PDB DIR 1    Interestingly  the observation that sequences without structures are those that take  the most time to be checked is a reminder of the strongest rational argument that I  know of against torture  any innocent would require the maximum amount of torture  to establish his her innocence  which sounds   ahem   strange   and at least  inneficient  Then again I was never struck by the efficiency of the Bush  administration       Alignment Evaluation    Q  How good is my alignment     A  see what is the color index     Q  What is that color index     A  T Coffee can provide you with a measure of consistency among all the methods  used  You can produce such an output using     EXCL  t coffee sample seql fasta  output score html    This will compute your seq score html that you can view using netscape  An  alternative is to use score ps or score pdf that can be viewed using ghostview or  acroread  score ascii will give you an alignment that can be parsed as a text file     A book chapter describing the CORE index is available on   http   igs server cnrs mrs fr  cnotred Publications Pdf core pp pdf       Q  Can I evaluate alignments NOT p
75. the use of t_coffee_dpa        55     dpa master aln    Usage   dpa master aln    File  method  Default   dpa master aln NO    When using dpa  t coffee needs a seed alignment that can be computed using any  appropriate method  By default  t coffee computes a fast approximate alignment     A pre alignment can be provided through this flag  as well as any program using the    following syntax     your script  in   fasta file    out   file name       dpa maxnseq    Usage   dpa maxnseq  integer value gt   Default   dpa maxnseq 30    Maximum number of sequences aligned simultaneously when DPA is ran  Given the tree  computed from the master alignment  a node is sent to computation if it controls more than      dpa maxnseq OR if it controls a pair of sequences having less than    dpa min score2  percent ID      dpa min scorel  Usage   dpa min scorel    integer value gt   Default   dpa min score1 95    Threshold for not realigning the sequences within the master alignment  Given this  alignment and the associated tree  sequences below a node are not realigned if none of them  has less than  dpa min scorel 46 identity      dpa min score2    Usage   dpa min score2  Default   dpa min score2    Maximum number of sequences aligned simultaneously when DPA is ran  Given the tree  computed from the master alignment  a node is sent to computation if it controls more than      dpa maxnseq OR if it controls a pair of sequences having less than    dpa min score2  percent ID      dap tree  NOT I
76. ure of the input  From  version 2 20  all files must be tagged to indicate their nature  A  alignment  S   Sequence  L  Library      We are becoming stricter  but that s for your own good       Another important modification has to do with the flag  matrix  it now controls the  matrix being used for the computation       18    This manual is at a very preliminary stage of redaction and will only show you how  to do the very basic with T Coffee  In order to solve a more specific problem  or  answer a query  we suggest you first go through the FAQ to see of your problem has  been addressed  read it and then read carefully the documentation associated with  corresponding flags  Of course  we also welcome queries and do our best to provide  answers and clues in a timely manner     Using T Coffee    Standard Alignments    T Coffee can align sequences  structures and profiles  The default mode when using  t_coffee is     EXCL  t coffee sample seql fasta  It is also possible to combine sequences from various sources   EXCL  t coffee sample seql fasta sample seq2 fasta  Or even  sequences coming from sequences and alignment files   EXCL  t coffee sample seql fasta Ssample aln2 aln    Note  the    S    identifier tells the program to use the alignment as a collection of  unaligned sequences     Alignment Combination  It is possible to combine several alignments into one final alignment     EXCL  t coffee  in Asample alnl 1 aln Asample alnl 2 aln    outfile combined aln aln    Note the
77. vided for compatibility with ClustalW      profile2  cw   Usage   profilel   lt name gt    one name only  Default  no default    Similar to the previous one and was provided for compatibility with ClustalW     Alignment Computation  Library Computation  Methods     lalign n top  Usage   lalign_n_top  lt Integer gt   Default   lalign_n top 10    Number of alignment reported by the local method  lalign       align pdb param file  Unsuported     align pdb hasch mode  Unsuported    Library Computation  Extension     lib list  Unsupported   Usage   lib_list  lt filename gt        46    Default unset    Use this flag if you do not want the library computation to take into account all the possible  pairs in your dataset  For instance    Format        2 Namel name2  2 Namel name4  3 Namel Name2 Name3          the line 3 would be used by a multiple method       do_normalise  Usage   do_normalise  lt 0 or a positive value gt     Default  do normalise 1000    Development Only    When using a value different from 0  this flag sets the score of the highest scoring pair to  1000      extend  Usage   extend  lt 0 1 or a positive value gt     Default  extend 1    Development Only    When turned on  this flag indicates that the library extension should be carried out when  performing the multiple alignment  If  extend  0  the extension is not made  if it is set to 1   the extension is made on all the pairs in the library  If the extension is set to another positive  value  the extension is on
78. wn library  cf next section     convert your aln into a lib  using the    weight flag     EXCL  t coffee  in Asample alnl aln  out lib test lib tc lib    lib only  weight sim pam250mt    EXCL  t coffee  in Asample alnl aln Ltest lib tc lib    outfile outaln    EXCL  t coffee    in Asample aln1 1 aln Asample alnl1 2 aln Mfast pair Mlalign id pai  r  outfile out aln    Generating Your Own Libraries    This is suitable if you have local alignments  or very detailed information about  your potential residue pairs  or if you want to use a very specific weighting scheme   You will need to generate your own libraries  using the format described in the last  section     You may also want to pre compute your libraries in order to save them for further  use  For instance  in the following example  we generate the local and the global  libraries and later re use them for combination into a multiple alignment     EXCL  t coffee sample seql fasta  in Mslow pair  out lib  slow pair seql tc lib  lib only    EXCL  t coffee sample seql fasta  in Mlalign id pair  out lib  lalign id pair seql tc lib  lib only    Once these libraries have been computed  you can then combine tem at your  convenience in a single MSA  Of course you can decide to only use the local or the  global library       27    EXCL  t coffee sample seql fasta  in Llalign id pair seql tc lib  Lslow pair seql tc lib       28     IMPORTANT  All the files mentionned here  sampe seq     can be found in the example l        director
79. y  weight   The Library used for the computation is the one computed before  the method is used  The resullt is therefore dependant on the  order in methods and library are set via the    in flag     align pdb pair Uses the align pdb routine to align two structures  The pairwise    scores are those returnes by the align pdb program  If a  structure is missing  fast pair is used instead  Each pair of  residue is given a score function defined by align pdb    UNSUPORTED        21    lalign id pair Same as lalign rs pir  but using the level of identity as a weight     lalign s pair Same as above but does also the self comparison  s stands for  self   This is needed when extracting repeats  The weights used  that way are based on identity     lalign rs s pair Same as above but does also the self comparison  s stands for  self   This is needed when extracting repeats     Matrix Amy matrix can be requested  Simply indicate as a method the  name of the matrix preceded with an X  i e  Xpam250mt   If you  indicate such a matrix  all the other methods will simply be  ignored  and a standard fast progressive alignment will be  computed  If you want to change the substitution matrix used by  the methods  use the    matrix flag     fast cdna pair This method computes the pairwise alignment of two cDNA  sequences  It is a fast pair alignment that only takes into  account the amino acid similarity and uses different penalties  for amino acid insertions and frameshifts  This alignment is  tu
80. y of the distribution     Abnormal Terminations and Wrong Results    Q  The program keeps crashing when I give my  sequences    A  This may be a format problem  Try to reformat your sequences using any utility   readseq      We recommend the Fasta format  If the problem persists  contact us     A  Your sequences may not be recognized for what they really are  Normally T     Coffee recognizes the type of your sequences automatically  but if it fails  use     EXCL  t coffee sample_seql fasta  type PROTEIN    Q  The default alignment is not good enough    A  see next question    Q  The alignment contains obvious mistakes    A  This happens with most multiple alignment procedures  However  wrong  alignments are sometimes caused by bugs or an implementation mistake  Please  report the most unexpected results to the authors     Q  The program is crashing  A  If you get the message     FAILED TO ALLOCATE REQUIRED MEMORY  See the next question     If the program crashes for some other reason  please check whether you are using  the right syntax and if the problem persists get in touch with the authors        29    Q  I am running out of memory    A  You can use a more accurate  slower and less memory hungry dynamic  programming mode called myers_miller_pair_wise  Simply indicate the flag     EXCL  t coffee sample seql fasta  special mode low memory    Note that this mode will be much less time efficient than the default  although it  may be slightly more accurate  In practice the par
81. you are done     Output Control  Generic    Conventions Regarding Filenames    stdout  stderr  stdin  no   dev null are valid filenames  They cause the corresponding  file to be output in stderr or stdout  for an input file  stdin causes the program to  requests the corresponding file through pipe  No causes a suppression of the output   as does  dev null     Identifying the Output files automatically  In the t_coffee output  each output appears in a line                         FILENAME  lt name gt  TYPE   Type   FORMAT   Format         Alignments     outfile    Usage   outfile  lt out aln file default no gt   Default  outfile default    Indicates the name of the alignment output by t  coffee  If the default is used  the alignment  is named   your sequences gt  aln     output  Usage   output  lt formatl format2     gt        61    Default  output clustalw    Indicates the format used for outputting the  outfile     Supported formats are     clustalw_aln  clustalw   ClustalW format     gcg  msf aln  pir aln   fasta aln  phylip   pir seq    fasta seq    As well as     Score ascii  score html  score pdf    Score ps      MSF alignment      pir alignment      fasta alignment      Phylip format      pir sequences  no gap        fasta sequences  no gap        causes the output of a reliability flag    causes the output to be a reliability plot in HTML    idem in PDF  if ps2pdf is installed on your system        idem in postscript     More than one format can be indicated     EXCL  
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
HR-S5970E HR-S5971E HR-S5972E  CDJ-900NXS  Lot 3 - CNRS  Electro-Voice CP3000S User's Manual  取扱説明書    取扱説明書  Model 69254-006 Master Control Unit PCBA - GAI  HOJA DE DATOS DE SEGURIDAD  1208707 PDVD-163 说明书 西班牙语    Copyright © All rights reserved. 
   Failed to retrieve file