Home
        here - TCDB
         Contents
1.     That s it       Get a Family abbreviation from a TC Family digit   Names   tcdb Names    abvr   Names get_ family _abr    2 A 1       Contains abrv       Get potential substrate from TCID  This feature is     Still in beta mode  so there may be errors in     Substrate determination    Substrates   tcdb Substrates     molecules   Substrates get_tcid_substrates    2 A 1 1 1        molecules contains a list of potential substrates    
2.   Optional Settings     min MIN    max MAX    flank F_LENGTH    method METHOD    VSrestrict SSET    VTrestrict TSET    consecutive    fasta_only    threads THREADS    Vamsee Reddy    show program s version number and exit  show this help message and exit   Path to your subject fasta file   Repeat Size   Output Name    Minimum TMS Requirement   Maximum TMS Requirement   Size of hydrophilic padding around TMSs  10 aa    1  horizontal   2  Vertical   3  Both   Vertical Subject Restriction    Vertical Target Restriction    Includes only consecutive targets  r  TMSs apart  Build ONLY Fasta DB s   No TMS Searching    Number of threads  lt default 16 gt     AncientRep  AR  is a dense program with lots of features  An analysis can be very  computationally intensive  therefore the program will use all processors by default   AR includes two programs for extracting results  and will be described further into  this tutorial  Results can be viewed and analyzed before AncientRep is entirely  finished running  The first part of this tutorial will describe how to begin searching  for repeats  The next part of this tutorial will explain how to interpret results     Before working through this guide  please read The Bio V Suite paper  Reddy  amp   Saier  2012   In specific  read the section on AR  It is important that you understand  how AR works  and are familiar with some of the jargon used when discussing this    tool     Step 1  Obtain a FASTA List  This can be done using the protocol1 pr
3.   When prompted for the original FASTA file  enter the file path  or just drag  amp   drop  the original FASTA file that was given to AR  When this optimization process  is complete  two sequences will be returned  These are your repeat units     If despite lowering your min_sd value  no good results were found  then you  move onto the vertical search results  Assuming we used an    r    value of 2  with a 6  TMS protein  the following folder should exist  1 2  2 3  3 4  4 5   amp  5 6  Start at the  lowest TMSs  Type into your shell     cd 1 2     Once you have entered that folder   repeat the exact same process as you did for analyzing horizontal repeats  The only  difference is that the file to analyze is called vertical txt not horizontal txt     Programmatic Interface with TCDB    This section will instruct you on how to use the TCDB module to interact with the  TCDB org website by example    First we start by importing the module named tcdb py  import tcdb      How to retrieve a FASTA from TCDB using an ACC  fasta   tcdb acc2fasta    MYACC       Returns a FASTA      Get a list of accessions or SeqIO Object     of FASTAS to define a family   accs   tcdb define family    2 A 1       list of accs   fastas   tcdb define family    2 A 1    True    SeqIO object      Use a local copy of the TCDB BLAST DB  Including this     Line will automatically update old DB   s over 5 days  and    Will build the DB with BLAST  File will always be found    In  HOME db tcdb   tcdb use_local 
4.   amp   o  The optional settings are denoted with a  double dash       The   subject  amp    target settings are used to annotate sequences  within the report  These are useful incase you forget what you were originally trying  to compare  The   assign setting is equivalent to the one found in the TSS program   The   shuffle setting is 500 by default and determines how many shuffles GSAT  should perform for each alignment  The last two settings are   stms and   ttms  Both  of these must be set for either of them to work  For example  if both of these were  set to    12     then the report generated will only deal with proteins containing 12    TMSs     Example Usage     We have two fasta files  subject faa  target faa  that were called    MFS     amp     Gap  Junction    respectively that we are comparing  We are only interested in MFS  proteins with 12 TMSs and Gap Junction protiens with 4 TMSs  Here is the command  to run for this scenario     protocol2 py    s subject faa    t target faa    o comparison   subject    MFS       target     Gap Junction       Programmatic Usage    Protocol2 may also be accessed within programs  Doing this will allow a user to  generate HTML reports from command line     import protocol2  compare   protocol2 Compare      compare subject file      mysubjects faa     compare target_file      mytargets faa     compare outdir      path to outdir  compare shuffle   500   compare assign   3   compare subject_name      MFS     compare target_name     
5.  8  When to use   threads  This option is for advanced users  By default  AR uses  the maximum number of threads available to the machine  This corresponds to the  number of processors the machine has  There is no benefit to lowering the default  value  unless you need to reserve x number of processors for another task  Example     ancient py  i myfastas faa  o myoutput  r 2   min 6   max 6   threads 5    This is all the information you will need to get AR running  The rest of this tutorial  will explain how to interpret your AR results     Interpreting Results    Understanding AR is straight forward  once you are familiar with all the  ropes  There will be a lot of results  so the trick is to find results with good SD scores   10 SDs  with low percent gaps  typically less than 15    These standards are not  set in stone  An acceptable percent gaps can fluctuate greatly  It depends on where  the gaps are  If the gaps are in the hydrophobic region  it is likely that the TMSs  being compared are not aligning  If it is in the hydrophilic portion  then there is a  chance that the actual TMSs are aligned  and you have found a potential repeat     The first results to examine are the horizontal results  From your terminal   type    cd OUTPUTFOLDER    and replace OUTPUTFOLDER with the directory you    originally wrote  Type    Ils    to see the contents of the folder  There should be a file  called    horizontal txt    and several folders with numbers in them  Use the readlive py  t
6.  Junctional Family     compare srestrict   12   compare trestrict   6   compare         Results found in specified outdir    TSS   Targeted Smith Waterman Search  Filename  tss py    Description  TSSearch is a heuristic Smith  amp  Waterman search tool that will rapidly  find close  amp  distant homologs between two FASTA files     Basic Usage  Simply type  tss py into the shell to retrieve a list of options     Usage  tss py  options     TSSearch is a heuristic Smith  amp  Waterman search tool that will rapidly find  close  amp  distant homologs between two FASTA files     Options    h    help show this help message and exit   s SUBJECT Path to your subject fasta file   t TARGET Path to your target fasta file   o OUTPUT Output file to create   m MAX TARGETS Targets per subject  Optional  Default  3    r RANDOM Number of times to shuffle each sequence  Optional  Default     300     Typically the default settings are fine for most scenarios  The only settings that are  mandatory are    subject        target      amp     output     For example  If a user has two fasta files   subjects faa  targets faa  in their current working directory  the syntax for  executing a TSS comparison would look like     tss py    s subjects faa    t targets faa    o compared    The output file will be a tab separated text file containing the subject key  target key   and z score     TSS Programmatic access   TSS can be executed within other python scripts  A benefit of the TSS wrapper is  there is 
7.  are in one of your  PATH  folders  Also  remember to define  HMMTOP_PSV and  HMMTOP_ARCH in your  profile to point to both of HAMTOP   s dependencies     If you did everything correctly  you should be able to execute any BioV program  from your shell from any working directory     TMStats  Filename  tmstats py  Description  Tabulate and generate a bar graph of TMSs and their occurrences  within a FASTA file or TC hierarchy   Basic Usage  tmstats py    Usage  tmstats py  lt fasta_file gt   lt title gt   lt output gt     Where    fasta_file    is the path to your list of FASTAs     Title    is the text to display on the  graph     Output    is the file path to write to     Example usage   tmstats py mfs faa    MFS TMS distribution    myoutput png    To analyze a TC hierarchy  use the web based version of TMStats  TCDB is  frequently updated  so this is the best solution  TMStats Web can be found here   http   www tcdb org progs  tool tmstats       To use the web based version simply enter any TC number  You many enter multiple  TC numbers  just separate by commas  Use the         symbol to denote a wildcard in  the TCID    For example  to get stats on all the MFS proteins  2 A 1   you would enter  2 A 1      Or  to get info on the first TCID of each cluster within the MFS family  you would  enter 2 A 1  3 1    To combine results of MFS  amp  APC  you would enter  2 A 1  3 2 A 3    Programmatic Access    You can interface with the TMStats program within python scripts  Here is 
8.  will build a TinySeq XML format file  and araw FASTA file when provided with a list of gi numbers  In this example  we  will use protocol  to generate a list of gis  and then pass this list to our helper  function within blast py    import protocoll  import blast    psi   protocoll psi     psi query      Q7VW14       psi expect   0 005   psi results   500     Result numbers must be in this list      10 50 100 250 500 1000 5000 10000 20000  psi init_blast     psi iterate     psi iterate     psi iterate       This will perform 3 additional iterations   psi fetch_results       Done  now generate fasta file with blast tools  blast_tool   blast tools     blast_tool gis   protocoll gis   Link our gi list  blast_tool build_xml     Build TinySeq file      Handle can be found in blast tool xml_file  blast_tool build_raw_fasta     Generate raw FASTA file    FASTA handle found in blast_tool raw_fasta     Both files are temporary handles  They will be lost when the    program exits  The next 3 lines demonstrate how to save it from    memory to disk    import shutil  shutil copy blast_tool xml_ file name     TinySeq xml       shutil copy blast_tool raw_fasta name     myfasta faa         DefineFamily  Filename  define_family py  Description  Generate FASTA files to represent any TC Family   Basic Usage  type into the shell  define_family py  Usage  define_family py FAMILY  lt P PSI gt  OUTPUT     FAMILY    can be any 3 unit TC ID  For example     2 A 1      Select P or PSI  This correspon
9. BioV Documentation    A set of programs designed to aid research in the field  of transport protein evolution     vreddy ucalgary ca       http   biov tcdb org       Installation    The BioV suite has been tested on Ubuntu 10  and OS X Leopard and OS X Lion   However  this suite should run on virtually any unix based operating system     Before we begin  please ensure you have the following dependencies installed     Python 2 7    BLAST  Latest     EMBOSS Package      Matplotlib      BioPython    FASTA Package      Mechanize    HMMTOP          All of these packages are available from most unix installation managers  except  FASTA package  amp  HMMTOP   It is highly recommended that you install them using a  repo manager  IE  apt get or darwin ports     Once all of these programs are installed properly and can be accessed from the shell   exist in the  PATH variable   download the BioV suite here     Extract the folder contents to any folder on your machine  Add this folder path to  your  PATH directory and to your  PYTHONPATH directory  To do this you will  need to edit your profile file  This path varies in operating systems  For OS X it is  found in    bash_profile  And in Ubuntu it is    bashrc     All BioV scripts have a shebang line pointing to  usr local bin python  If your  installation is somewhere else  you should make a dynamic link to this path     Most errors arise from an incorrectly installed copy of HAMTOP and  SSearch36 FASTA Tools   Make sure these executable
10. an  example     import tmstats    fasta      path to fasta faa     label      My bargraph     output        graph png       tabs tms   tmstats calculate fasta label output     output written to   graph png     tabs contains raw statistics     tms contains TMS locations    GSAT   Global Sequence Alignment Tool  Filename  gsat py    Description  Perform a shuffle based alignment on two sequences     Basic Usage   From the terminal type  gsat py    Enter the    A    sequence and the    B    sequence when prompted  Make sure each  sequence contains no line breaks    By default  this approach uses several default values that have been determined as  the optimal settings for many transport proteins  These settings are  Gap Open  Penalty   8  Gap Extension Penalty   2  and 500 random shuffles    This should finish instantly  and return a report with your alignment and the  standard score  z score  at the very bottom     Alternatively  this program can be run from the official TCDB website  using this link    http   tcdb org progs  tool gsat       Using GSAT in a programmatic context   The best way to showcase all of GSAT   s features is by example   Here is a sample code with commentary     import gsat   Import the package   gs   gsat cmd     Initialize our GSAT object   gs asequence      ABCD      First sequence to compare  gs bsequence      VBSC      Second sequence to compare  gs gapopen   8   NW Gap open cost  default 8    gs gapextend   2   NW Gap extend cost  default 2   gs sh
11. ds to BLASTP or PSI BLAST  respectively   The last option is OUTPUT  This is the FASTA file that will be created     For this example  we will generate a FASTA file for 2 A 1 using PSI blast  and write it  to Family faa    define family py 2 A 1 PSI Family faa    Protocol2    Filename  protocol2 py    Description  Find homologs between two FASTA files  amp  generate graphical reports    Basic Usage  In the terminal type  protocol2 py    Usage  protocol2 py  options     Welcome to Protocol2  This tool will allow you to rapidly locate homologs    between two fasta files     Subject  amp    Target are used to label items on    mandatory options      Subject    Target    Outdir  are the only    your    actual report  EX  protocol2  s 2 A 1 faa  t 2 A 3 faa  o mydir      subject  APC     Options     version   h    help   s SUBJECT   t TARGET   o OUTDIR    subject SNAME    target TNAME    assign NUM    shuffle RAND    stms SRESTRICT      ttms TRESTRICT      target  MFS        Developed by Vamsee Reddy    show program s version number and exit   show this help message and exit   Path to subject file   Path to target file   Path to output   Subject Name to appear on report   Target Name to appear on report   TSS Setting    Number of targets to assign each subject  Number of times to shuffle each alignment with GSAT  Report will contain X TMS subjects  TTMS must be set to  work   Report will contain X TMS targets   work    STMS must be set to    The only mandatory settings are  s   t 
12. esponds to 40    100  identity  A value of    1    will remove only  duplicate sequences  For more information  read the cd hit user manual     Below are all the settings that can be applied to Protocol1  The lines in BOLD are  applied by default  unless explicitly specified otherwise     Options     version show program s version number and exit   h    help show this help message and exit   q QUERY Gi Accession Sequence to BLAST      i ITERATE Number of additional iterations to perform   0    n NUMBER Number of results to fetch each round  500     e EXPECT E Value cutoff  0 005     c CUTOFF CD HIT threshold  From 0 4   1  0 8     o OUTDIR Output folder  plout      tms Include this flag to tabulate TMS stats     min MIN Minimum sequence lengths to retrieve     max MAX Maximum sequence lengths to retrieve    Advanced usage    If a user wants to generate a FASTA list of Q7VW14 from a total of 3 iterations  Has  a blast expectation value of 0 0003  and wants tabulated TMS stats  In addition  the  user wants the retrieved sequences to be between 300 350 aas long  In addition  the  user wants 1000 results or less  Here is the command for this circumstance     protocoll py  q Q7VW14  i 3  n 1000  e 0 0003   tms   min 300   max 300    When this is done  your files will be in the    plout    folder by default     Protocol1 programmatic access    Protocol1 can be accessed within other python scripts  However  it also requires the  aid of a helper module called    blast py     Blast py
13. mbinations     Step 6  When to use the   consecutive flag  The way AR works is as follows  the  subject frame selects    r    TMSs from the beginning of a protein  For this example  we  will use an r value of    2     The subject frame selects TMSs 1 2  The target frame will  select 3 4 and will continue to slide in increments of    1    until it reaches the end of the  protein  Ultimately  TMSs 1 2 will be compared to  3 4   4 5   5 6   If the     consecutive flag is enabled  the target frame will slide in increments of    r     2   With  this flag enabled  TMSs 1 2 will be compared to  3 4   5 6   Using the   consecutive  flag will increase the speed of AR  However  this should only be used if you are  confident in your    r    size and understand that the family of transporters you are  working with is well conserved  Many proteins may have experienced some type of  insertion or deletion after the repeat event  These events will make the     consecutive flag unreliable  Here is an example command for the   consecutive flag     ancient py  i myfastas faa  o myoutput  r 2   min 6   max 6     consecutive    Step 7  When to use the   fasta_only flag  If this flag is included at the end of any  command  AR will just built a FASTA database of all the individual TMSs without  actually performing any comparisons  This is useful if you need to build a BLAST DB  that consists of only TMSs  Example     ancient py  i myfastas faa  o myoutput  r 2   min 6   max 6     fasta_only    Step
14. n you believe that two sets  of TMSs are homologous and don   t want to conduct an exhaustive search    this can  save time  Such foresight can be derived from interpreting hydropathy plots  For the  next few examples  we will be looking at a hypothetical six TMS protein     Scenario 1  We are certain that TMS 1  amp  2 gave rise to 3  amp  4  We want to  restrict the subject  VSRestrict  to    1  amp  2    and restrict the target  VTRestrict  to    3     amp  4     Here is an example command  with this section in bold   ancient py  i myfastas faa  o myoutput  r 2   min 6   max 6     vSRestrict   1 2        VTRestrict      3 4         Scenario 2  We want to only compare TMSs 1 2 with 3 4  amp  5 6   ancient py  i myfastas faa  o myoutput  r 2   min 6   max 6     vSRestrict   1 2       VTRestrict      3 4   5 6        Scenario 3  We only care about which TMSs gave rise to 4 5  This will  compare ALL TMSs before TMSs 4 5 to these positions     ancient py  i myfastas faa  o myoutput  r 2   min 6   max 6     vTRestrict   4 5        Scenario 4  You have identified TMSs 3 4 as a precursor for something else   This command will compare TMSs 3  amp  4 to all the TMSs after this pair  Pay attention    to the difference with Scenario 3   ancient py  i myfastas faa  o myoutput  r 2   min 6   max 6     vSRestrict    3 4         Keep in mind  it is not necessary to set any of these vertical restrictions  Leaving  these options blank will instruct the program to perform all logical TMS  co
15. no need to parse the TSV file  A list of tuples can be read directly  Below is  an example usage of the TSS script     import tss  compare   tss compare      compare subject      mysubjects faa     compare target      mytargets faa     compare max   3    compare shuffle   300  compare       results are found in compare results    SSearch   Smith Waterman Search    Filename  ssearch py    Description  SSearch is very much like TSS  but will do shuffle based alignments for  every combination of subjects  amp  targets     Basic Usage  In the terminal  type  ssearch py    Usage  ssearch py  options     SSearch will compare every sequence in a subject file to every sequence ina  Another target file by using a Smith Waterman search  Results are returned  after each pair is shuffled  R number of times  and a Z Score is returned to  determine the quality of the alignment     Options    h    help show this help message and exit   s SUBJECT Path to your subject fasta file   t TARGET Path to your target fasta file   r RANDOM Number of times to shuffle each sequence  Optional  Default   300    o OUTPUT Filepath To write output    Example usage   We have the exact same variables as the TSS programs     ssearch py    s subjects faa    t targets faa    o compared    Programmatic Usage  Programmatic access for SSearch is only slightly different than TSS  However  they  return the same output format     import ssearch  compare   ssearch Compare      compare subject      mysubjects faa     com
16. ogram or the    DefineFamily program     Step 2  Now you must select a repeat size  By looking at hydropathy plots of  representative proteins  perhaps you should have a theory as to what the repeat size  is  If your prediction is wrong  the data is still valuable  The results will often reveal  the true repeat size  In reality  all repeat size analyses  regardless if they are correct    or incorrect  should all point to the same true repeat  Just make sure this is an  integer value greater than    1        Step 3  It is highly recommended that you set the   min  amp    max settings  Ideally  these should be the SAME number  Failure to set these can result in many false  positive results  and will make it difficult to interpret the results  For example  if you  are working with the MFS family  set these both to 12  This setting will only apply to  the vertical search method     Step 4  For almost all searches  the default flank size is sufficient  By default  AR uses  10aa padding  It is best to perform a vertical and horizontal search  By default AR  does both  Horizontal results are more convincing when available  Horizontal  searches will reveal recent TMS repeats  The vertical approach will find very old   ancient  repeats  Unless you are very confident in what you expect to find  it is best  to leave these settings untouched     Step 5  Using VSRestrict  amp  VTRestrict  These two settings allow you to choose exactly  which TMSs you want to compare  If for whatever reaso
17. ool to display your results  The format for executing the readlive py script is as  follows     readlive py  lt filename gt   lt all consec gt   lt min_sd gt        Filename    corresponds to the file we are analyzing  In this case  it is horizontal txt   The next parameter is    all    or    consec     Choosing    consec    will make sure that the  target TMSs are consecutive integers to the subject TMSs compared  For example   the script will retrieve comparisons between TMSs 1 2 vs 3 4 and TMSs 3 4 vs 5 6  It  would never retrieve TMSs 1 2 vs 4 5  Consecutive results almost always make  evolutionary sense  However  no data should be ignored  The final parameter is     min_sd    This is the minimum SD scores the program will retrieve  This is the full  command to run     readlive py horizontal txt all 10    If no results are returned  try lowering the min_sd value by increments of 1  until results are obtained  When results are available  they will be sorted by SD and    gaps  with the best scores displayed at the bottom  Scroll up until you find a  potential repeat  To view the actual alignment  copy the entire line to the clipboard   Then type into the shell     show_alignment py     When prompted  paste the line  into the terminal  and hit enter  The full alignment should be displayed  Verify that  none of the TMSs are gapped out  If you are happy with the alignment  then optimize  it  This will increase decrease the flank size until the alignment returns the highest  SD
18. pare target      mytargets faa     compare shuffle   300   compare         results are found in compare results    GBlast    Filename  gblast py  Description  Identify transport proteins from an entire proteome file     Basic usage  from the terminal type    gblast3 py       Usage  gblast py  options     Welcome to GBlast  Easily identify transporters in entire  Genomes Proteomes    By Vamsee Reddy    Options     version show program s version number and exit   h    help show this help message and exit   i INPUT Path to genome proteome file   o OUTPUT Results output name    cdd Include CDD analysis  Takes a while     This program is very straightforward to use  All you need is a FASTA file to analyze   GBlast can resume incomplete an analysis as well     Including the  cdd flag will fetch any overlapping conserved domains as identified  by NCBI   s CDD tool  Including this option will take a long time to complete     Example usage   gblast3 py  i ICH fsa  o analysis  cdd    Filename     ancient py    AncientRep    Description  Find TMS Repeats within a list of homologues     Usage  Type into the terminal  ancient py    Usage  ancient  options     This is the Ancient program  This tool will find very old  and new  TMS repeat    units  Results can be viewed as they become    available by using the    readlive py command  Alignments can be viewed using the show_alignment py    command  This tool was created by     Options     version   h    help   i INPUT   r REPEAT   o OUTPUT  
19. uffles   500   Number of shuffles to perform  default   gs     Begins the actual process       Now that GSAT has run  lets extract some data         Grab just the needle outfile content      This file contains alignment stats for just ONE shuffle   gs outfile seek 0    outfile contains the handle  rewind it first   print gs outfile read        Grab calculated data   print gs zscore   Prints the z score  rounded to the nearest whole  number    print gs zscorep   Same as above  but not rounded    Print gs average   Average NW bit score from shuffles   print gs gaps   prints the percentage of gaps in alignment    Protocol 1  Filename  protocol1 py    Description  Perform NCBI PSI Blasts with iterations remotely without downloading  a local database  In addition  one may annotate and tabulate statistics about a  generated dataset     Basic Usage   From the terminal type  protocol1 py    You will be prompted for an accession number or a gi number  You may also enter a  sequence to BLAST  but make sure it is without the header and contains no line  breaks  Next you are prompted to count the TMSs  This will generate an additional  bar chart with the number of TMSs and their occurrences  The next prompt is the  output path  This can be anything  and protocol  will generate this folder containing  your data  Finally  you will be prompted for a cd hit cutoff value  This will remove  sequences that are too similar to others in the population  This can be a value from  0 4   1  This corr
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
TDSHーBA 仲 ` 一 保管用  T'nB PRECIPT2 screen protector  SERVICE MANUAL - Schematics Forum  Sitecom Wireless Network USB adapter 54g Turbo  Descargar  Integrated Faraday Cup Controller User Manual  Galaxy ® CO2 Incubators  先進医療 B において使用される未承認等の医療機器について薬事承認  Deutsch - PTS Diagnostics  Comparaison entre les Exigences Essentielles de Sécurité    Copyright © All rights reserved. 
   Failed to retrieve file