Home
        1 JFitom v Alpha – USER MANUAL Introduction JFITOM is a
         Contents
1.         Options are set according to the options file        Figure 1  Graphical User I nterface of J FITOM    Credits   Original FITOM and xFITOM code by Ivan Erill  JFITOM development by Omar Shehab and  Ivan Erill    Ivan Erill 2010   If using for research  please cite  Erill     O Neill  M C     A reexamination of information  theory based methods for DNA binding site identification    BMC Bioinformatics  2009 Feb  11 10 1  57     How to Get J FITOM  Visit http   userpages umbc edu  erill 7399  and select JFITOM project  This opens the  home page of JFITOM project  The executable file is available on the page for download        System requirements  Software     biojava jar     jFitom core jar     bytecode jar     commons cli jar     commons collections   lt latest version gt  jar     commons dbcp  lt latest version gt  jar     commons pool  lt latest version gt  jar     jgrapht jdk1 5 jar     log4j  lt latest version gt  jar     JDK JRE 1 5 or above  Operating system     All operating systems supported by J DK J RE 1 5 or above  Hardware     All hardware configurations supported by J DK J RE 1 5 or above  All required libraries except JDK JRE are shipped with JFITOM  They are archived in the  J FITOM executable file     Getting started   JFITOM    s distant predecessor  FITOM  is a command line argument based program  meaning  that it is run from a DOS prompt  Its more immediate predecessor  xFITOM  includes a  Graphical User Interface  GUI  to select the required files 
2. J Fitom v Alpha   USER MANUAL    Introduction   J FITOM is a portable and extended version of xFITOM  a computer program for the detection  of binding sites in DNA sequences  JFITOM implements several methods described in the  literature to compute an approximation of binding affinity for a particular transcription factor  binding site based on a collection of binding sequences provided by the user  Using these  methods  JFITOM scans a sequence file looking for putative binding sites across the DNA  sequence in both strands  and filters the results according to a user specified threshold   JFITOM will also link the identified sites with annotated genes and it will infer their roles  from their location in the vicinity of genes     E JFitom   Alpha  File Help    Input and output files Default path  Genome file Set default path    Browse Browse    Sites file      Current options file           vV  Are these sites palindromic           Output file    Browse    Scoring methods   ORi ORixRseq Olseq x RE  Annotation strategy  Gene search hysteresis  Intergenic distance  Operator distance    Maximum intragenic distance    Filtering strategy  Maximum number of sites    OReturntop 100  sites    Returntop 10     of the genes  Threshold site score        In a normalized score scale of 0 0 to 1 0   0 5       Standard deviation from the mean 0 0      Generate results                Generate log files View the result file  MS Windows     Generate results Restore Factory settings           
3. and to set all the necessary  options  JFITOM provides new functionality  like platform independency and the ability to  create a list of regulated genes for each site     JFITOM can be launched from the command line  from a script file or using the GUI  To run  J FITOM from the command line  the user has to use java with JFITOM run  jar as the main  argument  JFITOM run  jar takes three command line arguments  which are as follows  the  options file  the genome file and the site collection file  A sample command to run J FITOM  would be as follows        java  jar JFITOM run jar  g  genome file   s  site file   o  options OPT   To run JFITOM from script files the same commands can be used   JFITOM GUI facilitates saving or customizing options in an interactive way  If the JFITOM    command is not provided with any argument or if J FITOM run jar is double clicked  J FITOM  will launch the graphical user interface     Main operation files in J FITOM   JFITOM operates with three main files  a file containing the genome sequence to be  searched  genome file   a file containing a list of binding sites  collection file  and a file  specifying the program options  options file      The sequence file   Genome file    The sequence file  Sequence _file ext  is the file containing the sequence or sequences the  user wants to scan  The sequence file can only be in GenBank format    GB   GBK or   Genbank      The collection file   Site Collection file    The collection file  Collection_f
4. andard deviation from the mean 0 0  Figure 6  Filtering strategy  The last section  Generate results  allows the user to store the options and launch the  analysis  The user can also decide if she wants to generate log files or view the result    immediately   Generate results    Generate log files View the result file  MS Windows     Generate results Restore default options    Figure 7  Generate result    Input and output file processing    Loading genome file   The main JFITOM program loads the genome file first  The file can be only in  GenBank format  After loading the file  J FITOM uses Bio  ava library to parse that file  and extract the genes along with annotations  For annotation  the following  parameters are stored   name  location  strand  locus tag  protein id  product and  note    Loading sequence file   After the genome file  JFITOM loads the file containing a list of known binding sites  which are used to constructor model of binding site or motif  The file can be in two  formats   FAS or TXT files    Loading options file   If run from the command line with an options file as parameter  JFITOM functions  according to the options set in the options file  If no options file is specified as the  command line parameter  it looks for an options file in the default folder   lt JFITOM  home directory gt  config  options conf   If the options file is not available in the  default location  it runs with factory settings  If the options file is missing any  parameter  th
5. e value of the parameter is used from factory settings  When the user  is running the GUI  the options are saved in the default path before generating the  result  The following options are defined in the file      latestGenomeFile  file path in standard format     latestSitesFile  file path in standard format     isPalindrome  Y if the binding sites are palindromic  else N     latestOutputFile  file path in standard format     scoreMethod  0 for Rj  1 for Ri X Rseg  2 for Iseq and 3 for Iseq X RE  more  details in following sections      maxHysteresisLimit  maximum number of base pairs to be scanned  looking for a gene  upstream or downstream located binding site   maxIntergenicDistance  maximum number of base pairs between genes  to be reported as part of an operon     maxOperatorDistanceOut  number of base pairs upstream of a gene  translational start site for a site to be considered    operator     If maxed  the  site will be labeled    intergenic         maxOperatorDistanceIn  maximum number of base pairs downstream of  a gene translational start site for the site to be considered    operator     IF  maxed  the site will be labeled    intragenic        resultSizeMethod  0 if the absolute size or 1 if relative size is specified     resultSizeMethodValue  the size of the list     thresholdScoreMethod  0 if the threshold is an SD band and 1 if the  threshold is normalized     thresholdScoreValue  the threshold score     saveOptions  Y if the user wants to save the options t
6. ed by the user input and used as the  threshold value  please refer to Figure 5      Annotating the genes  After building the list of site  J FITOM annotates each site with the following information     category  whether the site is    intergenic        intragenic        operator        isolated    or none of  these     relative position  distance from the first Gene       genes  a LinkedList of co regulated Genes     J FITOM allows the user to set following parameters for annotation      Gene search hysteresis  JFITOM uses this value as the highest limit up to which it  will scan for the first forward Gene for downstream search or the first reverse Gene  for upstream search around a site  If no gene is found within the hysteresis in either  direction  the site will be marked as    isolated         Intergenic distance  On upstream or downstream region  when J FITOM finds the first  gene it looks for the co regulated genes in an operon configuration  These genes  may be separated at most by this distance  If no gene found     FITOM stops searching  for the site for that region      Operator distance  When a site is found in an intergenic region  JFITOM further  checks if it is within the operator distance from the start of the closest gene  If it is   the site is an    operator    site  otherwise  it is an    intergenic    site      Maximum intragenic distance  When a site is found inside a gene  JFITOM further  checks if it is within the maximum intragenic distance from the s
7. ile ext  is the file containing the collection of known binding  sites that the user provides the program with in order to construct its model of binding site   or motif  Collection files can be either bare site files  plain text with aligned sites on  consecutive lines  or FASTA files  in which each site line is preceded by an identification line  beginning with     gt      Accepted extensions are  FAS FNA for FASTA files and TXT for bare site  files     The options file   Options file   The options file   OPT  stores different operational strategies and information  If no such file  is specified in the command line argument  the software operates with factory defaults     Main functionalities  JFITOM provides the following functionalities dealing with different aspects of program  operation     The GUI   The graphical user interface is very simple and intuitive  All the operations are done on a  single window  Figure 1   The window contains a form for user input which is divided into  six sections  The first section is Input and output files  This section takes the genome file  and the sites file as user input  If the user wants to specify that the sites are palindromic   she can click the check box  The user can also specify the name for the output file where    the result will be stored   Input and output files    Genome file    Browse    Sites file    Are these sites palindromic     Output File       Browse    Figure 2  Input and output files    The second section  Defau
8. lt path  allows the user to set the default path for present session   After setting the default path  all other file browsing controls of J FITOM sets their current  directory to this path  After setting the default path  if the directory does not contain any   OPT file it requests the user to give a file name    OPT  where the options will be save   Instead of creating a new file  a user may also select an old  OPT file  In that case  J FITOM  GUI controls are set to the values stored in the options file  If the user chooses not to select  the options file at that time  she is asked again for the file name during computing and  storing options and result    Default path   Set default path       Browse    Current options File     Figure 3  Default path    The third section  Scoring methods  allows the user to choose scoring method  There are    four options  By default J FITOM scores using the R  method   Scoring methods      Ri     Rix Rseq     Iseq    Iseq x RE    Figure 4  Scoring methods    The fourth section  Annotation strategy  allows the user to set the annotation strategy   While J FITOM scans a genome it needs to know the hysteresis limit  the maximum number  of base pairs JFITOM scans to determine the first gene with appropriate orientation    intergenic  the maximum number of base pairs between genes of same orientation  regulated by the same binding site   operator  the maximum number of base pairs within  which the regulating site is located before a gene seque
9. mum  intergenic distance     Generating output   After running from the command line or from a script  JFITOM saves the scored and  annotated binding sites as a CSV file in the  lt JFITOM home directory gt  output directory  and terminates  If the user is running the GUI  JFITOM will save the options before starting  the main operation  If the user wants to see the details in a log file she has to select the  Generate log files check box  To view the result instantly she can select View the result file  before clicking the Generate result button  The user can also use the default options by  clicking the Restore default options button  In the GUI the user can also specify the output  file destination     The first column of the CSV file contains         symbol to distinguish each result site  For each  of them  the columns are as follows       Position   the position of the site in the Genome    Score   the score of the site      Strand   on which strand does the site resides    11    Site   the sequence   Up category   upstream category   Down category   downstream category   Up relative position   relative position regarding the first upstream gene  Down relative position   relative position regarding the first downstream gene    Genes   annotated list of upstream and downstream genes ordered according to  their position    12    
10. nce starts  and intragenic distances   the maximum number of base pairs within which the regulating site is located after a gene  sequence starts   The user can specify the values in terms of base pairs  The GUI validates  the inputs before using them     Annotation strategy    Gene search hysteresis 200  Intergenic distance  lso E  Operator distance  300  Maximum intragenic distance  50    Figure 5  Annotation strategy    The fifth section  Filtering strategy  allows the user to set the strategy to filter the result  If  the user sets the maximum size of the list aS Return top N sites  JFITOM returns N  number of sites at most  If the user sets the maximum size of the list as Return top N  of  the genes  the number of sites JFITOM returns is N  of the total number of genes  If the  user sets a normalized threshold  JFITOM computes the maximum and minimum score of  the given binding sites  Then this maximum to minimum range is converted to a normalized  range of 1 0 to 0 0  Finally  JFITOM converts the user given threshold back to the original  scale  If the user sets the threshold as standard deviation JFITOM computes the mean and  standard deviation of the scores of the given collection of sites  Then the standard deviation  is multiplied by the user input and used as the threshold score    Filtering strategy   Maximum number of sites      Returntop 100  sites     Return top 50     of the genes  Threshold site score    In a normalized score scale of 0 0 to 1 0  0 5       St
11. ncy of each position   the information from the rest of bases at that position discard by this method  is not used   To correct this  O   Neill proposed averaging this kind of methods with the known redundancy  index of the collection  O   Neill 1989   so that the final score was given by     L  R  AR oiie    gt  R   D R sequence  J     j   Another ranking method can also be     L  I RE  I   1  RE J   based on the RE formula     l 1    Sequence    Figure 3 demonstrates how to choose different methods from the GUI  The user can also  specify the ranking method in the options file     Filtering the search results   JFITOM allows the user to generate a selective list of results  To limit the list by size  the  user can set the maximum size by a number  like N sites  or a percentage of the number of  genes  for example N  of the total number of genes in the genome  please refer to Figure  5   JFITOM allows the user to specify the threshold score in two different ways  If the user  specifies a normalized threshold JFITOM computes the maximum and minimum score of the  given collection of binding sites  Then this maximum to minimum range is converted to a  range of 1 0 to 0 0  Finally  from the user given threshold  the effective threshold is  calculated back in the original scale  If the user specifies the threshold as standard deviation  from mean  JFITOM computes the standard deviation of the scores of the binding sites from  the user given collection  Then this value is multipli
12. o the file before  generating result  else N     generateLog  Y if the user wants to generate log messages  else N                  viewResult  Y if the user wants to view the result immediately  else N     applicable only if Microsoft Excel is installed   Set output path  The user can specify the path where the output file should be generated     Main operation  Here we describe the main modus operandi of the program  As mentioned above  J FITOM  loads the sequence and site files before start processing     Parsing the genome    After loading the genome file  JFITOM parses it and extracts the genes  The genes are  stored in an annotated list in the memory     Position specific weight matrix and information content  The site file is then used to compute the motif position specific frequency matrix  PSFM    This is a matrix of the relative frequencies of each nucleotide at each position in the motif  If  the user specifies that the sites are palindromic then JFITOM reverse complements the  sequences of all sites and appends them to the site collection before generating the PSFM   The following is a demonstration of consensus computed from the frequency of nucleotide                          1 2 3 4 5 6  A 0 031 0 055 0 650 0 349 0 309 0 007  C 0 928 0 015 0 015 0 071 0 158 0 007  G 0 007 0 206 0 166 0 031 0 079 0 976  T 0 031 0 722 0 166 0 547 0 452 0 007  Consensus   C T A T T G                            From the PSFM  the information content of the motif can be computed acc
13. ording to the  following formula     Rygere   X10  X Hrg D H ge O   3    el l  i l    Y F S  dog  f S        SEQ    f S   frequency of base S in the genome    p S   frequency of base S in the motif PSFM    H yore 7 a Priori entropy    H    after      entropy after observing binding    H    E 5X  pS  ost     SEQ    as described by Schneider et al   Schneider  Stormo et al  1986  and based on the  assumption of positional independency among the different positions of a binding site     The information content of a motif tells us about the reduction in uncertainty we experience  once we know that a protein  or other element  binds to a sequence  Schneider  Stormo et  al  1986  Erill and O   Neill 2009   Prior to binding  our uncertainty about what bases occupy  the different positions of a sequence is maximal  and dictated by the base composition of  the genome  Once we know that the protein associated with the provided motif binds that  sequence  however  we have much less uncertainty about what bases occupy the different  positions  We still have uncertainty  because protein binding is a noisy issue  but we have  decreased our uncertainty and  thus  we can say we have gained information  Conversely   seen from the point of view of a genome  the information content can also be seen as the  loss of entropy at certain regions in the genome  from an initial random state to a state of  fixation of conserved binding sites  Thus  motif information content can also be as an index  of 
14. tart of the gene  If it  is  the site is an    operator    site  otherwise  it is an    intragenic    site     Annotation information is determined both for the upstream and downstream area of the  site  A site can be one of the following categories      Operator  if the site is within the maximum operator distance of a gene on either   side  The user can set this limit for the intergenic and intragenic sites individually      Intergenic  if the site is between two genes and beyond their operator limit      Intragenic  if the site is inside a gene but after the maximum operator distance      None  if no gene is found during a search      Isolated  if the category of the site is None for both up and downstream     ee              1      l Reverse     Forward EE Forward EEE   I la   lamn    Figure 7  Binding site categories based on relative distance to genes     The relative position for a site is determined as the distance between starting position of the  site and the annotated starting position of the closest regulated gene     10    Special cases of annotation    Maximum hysteresis limit Correct Genes but beyond hysteresis limit               lt  lt  lt  1     gt   i J  Reverse x il X   ee                         Figure 8  For this site  J FI TOM found no genes during downstream search     Maximum intergenic distance       i Se a                4  J  1   li hH ee a   Forward iri   Forward ha i a l   gt     Figure 9  For this site  J FI TOM ignores correct genes beyond maxi
15. the level of redundancy  RI  in the different positions of the motif  O Neill 1998      Even though without a complete theoretical justification  a different index termed relative  entropy  RE  has been proposed to substitute the RI in cases of heavily skewed genome     RE 1    S   log  zan   D yfr   log as    SEQ    Relative entropy  Schneider  Stormo et al  1986  Erill and O   Neill 2009  is also computed by  J FITOM and can be used in different ranking methods     Ranking methods   Rsequence tells us how much information our motif conveys  but it does not provide answers to  how well a particular sequence fits in the motif profile  which is what is required to scan for  and rank putative binding sites     Several ranking methods have been proposed with diverse degrees of theoretical  justification  JFITOM provides a basic scoring method that can be used to rank putative  binding sites  The sequence information content  Ri   Schneider 1997  is a method derived  from the information content  Rsequence  formula that scores each position of a particular site   j  based on ratio of frequency in the motif with respect to genomic frequency for the  particular base observed in the site     R  D    EO      log  p 5         SEQ    This ranking method discards information from other motif base frequencies  As explained in   O Neill 2003   this can lead to erroneous scoring  where the same score may be given to  little or heavily conserved positions since information about the redunda
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
Ethiopian List Of Medical Instrument With Minimum  Steelseries Siberia USB Headset  LockState LS-19EPL Instructions / Assembly  SECO-LARM SK-910R4Q    Essenza C92 Z1 Nespresso  Samsung YP-20T manual de utilizador    Il était une fois Secrets de spécialistes Le tour de  Kit de test de graisse SKF TKGT 1    Copyright © All rights reserved. 
   Failed to retrieve file