Home

footprintDB

image

Contents

1. DNA binding data in footprinDB format is stored in a unique file containing TF sequences DNA motifs and DNA sites information and relationships Each entry is a single DNA motif that includes fields to annotate their DNA binding sites and related transcription factors A footprintDB data file has the following fields and structure VV VV VV Header with library data fields File Name Version Authorbst Unive Email s Date Phone 3 Separated by Fax vee Company Address VV Url gt Pubmed Description y XX End Of section header motif Taceor and site sections End of entry MOTIF SECTION MO Accession DE Description NA Names Separated by PO PSSM 01 LN Url CC Annotations Separated by RX PUBMED Pubmed ID RL Reference details XX FACTOR SECTION FA Accession DE Description NA Names Separated by SO Sequence IN Blast prediction interface Model Residues Total Aligned IN Identical ID e value method oc Uncorcocb Unt pret ib OS Organisms Separated by LN Url CC Annotations Separated by RX PUBMED Pubmed ID RL Reference details XX 6f SIIE SECTION SI Accession DE Description NA Names Separated by oO Sequence EN Iul CC Annotations Separated by RX PUBMED Pubmed ID RL Reference details XX If the SITE has not Pubmed Reference data data from site s motif Dy S
2. D In the former search we take as query a DNA motif in FASTA format a list of DNA binding sites all of them with the same length We want to search at most 10 TFs with similar DNA motifs without filtering organisms neither domains in all available databases We obtain the following results only the first 3 are shown footprintDB results for Demo Query bZIP910_ JASPAR_CORE mTGACGT 1 L Motif Fool pe ni PAAA Ririding Inl erac TootprimtDH template Urganisms T imita fly Comsermsus pm odiis peur ic oss MADnnna t MF DU bziPat Anlimhinum raja t Shon Shove irilerfacisi i 00 prabeinz box lika subclass footprintDB Pf dormais Shu domains i2 bZIP CREB JASPAR CORE 6 9e 10 5 99 6 ACCTCAk 2008 RECTOR Eus per Zu MEA Teak wrktimtmucTCnNEG We notice that the first result is the query itself because Demo query is from JASPAR database included in footprintDB and the others are similar DNA motifs present in 17 We can click on the links Show proteins Show interfaces and Show domains to retrieve information about proteins that bind the similar DNA domain retrieved in the search when there are annotated TFs for the DNA motifs second result has not related TF Predicted DNA binding residues for each protein are shown coloured in the interface sequence Left clicking on the footprintDB template accession name or on the DNA aligned sequence will display the corres
3. Demo button and then on Search 16 Search footprintDB for similar sequences or DNA motifs The database is designed primarily to receive two types of queries INPUT a DNA consensus motif PYM or site OUTPUT a list of DNA binding proteins probably TFs predicted to bind a similar DNA motif INPUT a protein sequence of a putative DNA binding protein OUTPUT a list of possibly recognized DNA motifs If you wantto search for individual entries TFs DNA motifs or sites using keywords please try the Keyword Search Form footprintDB Sequence Search Form Search name Demo Email Input type 9 DNA sites or motifs O Proteins that bind DNA Limit number of results per query 10 Order results by E value v Color results using twilight thresholds Query data or file bZIP910 JASPAR CORE ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT mm ES ICE b21P910 DNA Binding Motif in FASTAformat Organisms Homo sapiens Mus musculus Rattus norvegicus Saccharomyces cerevisiae Curated Databases JASPAR CORE 2008 RegulonDB 7 5 3D footprint 20130124 UniPROBE 20120919 Pfam domains All PF1 3894 C2H2 type zinc finger PFOO096 Zinc finger C2H2 type PF13465 Zinc finger double domain PF00046 Homeobox domain 2 Search for homologues in a selected proteome ED
4. footprintDB http floresta eead csic es footprintdb DNA to protein 7 DNA consensus pa A Pa P 52 524 gh 1 Gi E x A Tm ar a T i DNA binding s37 gt M aCe are 7 b M A E de i FR al gt 4 AN Li A y protein 4 a 7 e A 1 AP d Ls c T zm Ri ud j y om E af i Y oop e j rr Uia eed motif or site Protein to DNA User Manual Alvaro Sebastian Yag e amp Bruno Contreras Moreira Laboratory of Computational Biology Estaci n Experimental de Aula Dei CSIC Av Montanana 1 005 50059 Zaragoza SPAIN Index Index Introduction 1 footprintDB is a repository of databases 2 Annotation of transcription factor interfaces 3 footprintDB isa search engine Web site navigation 1 Sections 2 Navigation User registration 3 Registration 4 LogIn 5 Logout 6 Recover account info 7 Modify account info 8 Delete account Searching 1l Search keywords 2 Search DNA motifs 3 Search protein sequences 4 Retrieve stored searches 5 Searching through the Web services interface Database insertion 1 Insert a new database into footprintDB 2 footprintDB and TRANSFAC data formats 3 Manage own databases o VO 0 00 00 Jl A OG Un RR Y V hn Y UY uU C rn rv vn ER m N bh Y Q c fF bh A Bo oM coc Introduction footprintDB is a web server for assigning putative cis DNA motifs to input transcription factors TFs and conversely for
5. o Credits The search form looks like this 14 Search footprintDB for similar sequences or DNA motifs The database is designed primarily ta receive two types of queries INPUT a DNA consensus motif PW ar site OUTPUT a list of DNA binding proteins probably TFs predicted to bind a similar DMA motif INPUT a protein sequence of a putative DNA binding protein OUTPUT a list of possibly recognized DNA motifs If vau wantto search far individual entries TFs DMA motifs or sites using keywords please try the Keyword Search Farm footprintDB Sequence Search Form Input type DNA sites or motifs Proteins that bind DNA Limit number of results per query Guery data or file EAH b2ZIP910 DNA Binding Motifin FASTA format LL Exeminar Organisms Homo sapiens vius musculus Rattus norvegicus Saccharomyces cerevisiae Curated Databases JASPAR CORE 2009 HeqgulanDB 7 5 3D footprint 20130124 LMPROBE 20120919 Pfam domains C2H2 tvpe zinc finger Zinc finger C2H2 type Zinc finger double domain Hormeobo domain Search for homologues in a selected proteome E The search form has the following fields and options Search name Name a title for the search Email Please type your email if you desire to receive the results by email Input type Please choose DNA sites or motifs Limit number of results per query Enter the number of desired results per query
6. AT1G75390 sequence CAAT4022 bZIP910 erkrkRklsNReSArrSRurk c aq Interface CAAR 4022 erkrkRklsNReSArrSRurk caq erkrkRkqsHReSArrSRrmrkak bZIP910 Protein 412618160 sequence CAA74022 Interface EEG erkrkRklsNReSArrSRurkdqq erkrkRmlsHReSArrSRmrkcak Protein AT4634590 sequence CAA74022 Interface bZIP910 ne erkrkRklsNReSArrSRurkcaq qrkrkRmlsHReSArrSRmkkak Protein AT1G68880 sequence CAA74022 bZIP910_ JASPAR_CORE XBP DBD 2 Interface bZIP910 z erkrkRklsNReSArrSRork qq erkrrRkvsHReSArrSRrmrkar TGAI 2 Homologous proteins will be shown under each TF Each new row contains data of one protein left clicking on the protein name will open a new window with protein sequence in FASTA format Left clicking on the Blast E value Interface similarity or Template alignment columns will show the Blast alignment with the corresponding footprintDB protein sequence with coloured protein domains Pfam version 24 0 highlighting in red the identical interface residues and in blue the rest of the interface The last column Related results shows other footprintDB TF results which are presumably homologous to the same Arabidopsis thaliana protein Blast Interface E value similarity Template alignment de 25 PY 39 173 PT 3e 1T FIT 38 105 E Query AT3662420 1 Template CAA74022 bZIP910 022676 ANTMA bZIP910 Interface residues in blue identical in red Lenght 145 E value 7e 46
7. Binding Motifs bzIP810 1 GATGACGTGGCm bzIP810 2 Gor TGCTGACGT MADO96 1 mTGACGT Publications Martinez Garcia J F Moyano E Alcocer M J C Martin C Two bZIP proteins from Antirrhinum flowers preferentially bind a hybrid C box G box motif and help to define a new sub family of bZIP transcription factors Plant J 13 489 505 1998 Pubmed In the same way left clicking on DNA binding motif footorintDB PWM Consensus accession name will show the DNA binding motif information footprinDB Pfam domains Consensus 1 3 61 PFUFrZTI Basic region leucine zipper bzIP81 01 GATGACG bzIP810 2 GgrTGCTGACGT MADOSE 1 MTGACGT 26 DNA Binding Motif Accessions bZIP910 1 Athamap 20091028 Names bzIP810 1 Organisms Antirrhinum majus Libraries Atha amap 20091028 1 B low L Engelmann S Schindler M Hehl R amp thaMap integrating transcriptional and post transcriptional data Nucleic acids research 37 D983 6 2009 Pubmed Length 12 Consensus GATGACGTGGCm Weblogo Ga GACG GGCa PSSM A C T s 0 D d D G 02 14 0 i 0 A 03 0 D D 18 By 04 0 D 18 D G OS 18 D D D A 06 D 18 D D G 07 D D 18 D G 08 0 D D 18 T 09 0 D 18 D G 10 D 0 17 1 G 11 D 18 D D C 12 10 8 D 0 m Binding TFs bZIP910 Basic region leucine zipper bZIP transcription factor Publications Martinez Garcia J F Moyano E Alco mM J C Martin C Two bZIP proteins from Antirrhinum flowers prefere
8. Order results by Allows to order results by DNA or TF similarity or by E value Color results using twilight thresholds Only available for DNA search mark in green color results that pass the thresholds defined in our previous article and in red if not Sebastian and Contreras Moreira 2013 Query data or file Enter your DNA sites or motifs in the text area or upload them from a 15 file The only valid formats are FASTA and TRANSFAC You can also use sample data pushing the Demo button Table 1 Examples of FASTA and TRANSFAC formats for DNA input DNA motif in FASTA format DNA motif in TRANSFAC format gt bZIP910 JASPAR CORE DE bZIP910 JASPAR CORE ATGACGT 1 15 5 0 CTGACGT 2 0 35 ATGACGT 35 0 CTGACGT GTGACGT GTGACGT Organisms Select any organism s to restrict the search Multiple species can be selected pushing the Control key on your keyboard Use with caution as many TFs are not associated to an specific organism Databases Select databases or sources to restrict the search Multiple databases can be selected pushing the Control key Pfam domains Select protein Pfam domains to restrict the search Multiple domains can be selected pushing the Control key Please not that when search type DNA sites or motifs is selected the option to automatically Search for homologues in a selected proteome is shown which will be explained in the next section To start the search please click on the
9. Search keywords If you have a footprintDB account log in first into the website to store your searches and reuse them if you haven t got one registration is recommended Click in the Search Keywords option of the Main menu or go directly to the url http floresta eead csic es footprintdb index php search entries o Home o Databases o Search Keywords ium Sequences o Credits The search form looks like this footprintDB Keyword Search Form Entry type an Transcription Factors DNA Binding Motits O DNA Binding Sites search text Myb Demo equo 3 Organisms Homo sapiens Mus musculus Rattus norvegicus Saccharomyces cerevisiae Curated Databases JASPAR CORE 2009 RegulonDIB 7 5 30 footprint 20130124 UniPROBE 20120919 Pfam domains PF13894 C2H2 tvype zinc finger PFOOOSE Zinc finger C2H2 type PF13465 Zinc finger double domain PF00046 Homeobox domain The search form has the following fields and options Entry type To restrict search to Transcription Factors DNA Binding Motifs or DNA Binding Sites Search text Text to search it can be any descriptive word a transcription factor protein or gene name UniProt or PDB identifier original source accession name or DNA site sequence 11 Organisms Select any organism s to restrict the search Multiple species can be selected pushing the Control key on your keyboard Use with caution as many TFs are not ass
10. menu on the left side User Password Register Recover Account Info um Enter your email address and push the Recover button If any account is associated to that address you will receive your account data by email and a new auto generated password Recover Account Should you forget your username and password please type in your email address and you will receive an email with your account information and a new password Email Recover T Modify account info Log in and click on the Modify account link in the User menu on the left side User Menu o Stored results o Modify account o Delete account o Log out ium Modify your data in the formulary required fields are marked with asterisk and push the Modify button to submit the data You will see the following message if registration was successful User account successfully modified you will shortly receive an email with your new account information and you will receive an email to remember your account data 8 Delete account Log in and click on the Delete account link in the User menu on the left side User Menu o Stored results o Modif account o Delete accaunt Please confirm that you want to delete the account by pushing the Delete button o Log out Delete Account Are you sure you want to delete your footprintDB account This will also remove your stored search results 10 Searching 1
11. Identities 76 Positives 103 Gaps 0 Query 1 146 MGSLQMQTSPESDNDPRYATVTIDERKRERMISHRESARRSRMREQKQLGDLINEVTLLENDNAKITEQVDEASKEKYIEMESENNVLRAQASELTDRLRSLNSVLEMVEEISGQALDIPEIPESMQNPWQMPCPMQPIRASADMFDC M5 Q TSP D D ERERER 5NRESARRSRMRKQ L LI DN K Att Y S NNVLRAQ ELTDRL SLNSVL E 5G LDIP IP PIO PCP 0 AD F C Template 1 133 MASQQRSTSPGIDDD ERKRERKLSHRESARRSRMREQQRLDELIAQESQMQEDNKKLRDTINGATQLYLNFASDNNVLRAQLAELTDRLHSLNSVLQ IASEVSGLVLDIPDIPDALLEPWQLPCPIQ ADIFQC Pfam domains ed PF07716 Basic region leucine zipper fa PF00170 bZIP transcription factor 23 3 Search protein sequences a Find transcription factors with similar sequences Click on the Search Sequences button in the Home page click in the Search option of the Main menu or go directly to the url http floresta eead csic es footprintdb index php search o Home o Databases o Search Keywords SEQUENCES im o Credits The search form fields and options are explained in the previous Section In this case there is only a noticeable difference with respect to the input format of the sequence to search Query data or file Enter your DNA sites or motifs in the text area or upload them from a file The only valid format is FASTA You can use sample data pushing the Demo button Example of FASTA format gt bZsP910 JASPAR CORE MASQQRSTSPGIDDDERKRKRKLSNRESARRSRMRKQQRLDELIAQESQMQEDNKKL RDTINGATQLYL
12. TRANSFAC format 9 footprintDB format footprintDB file Do you wantto make public your database 9 yes O No Curated non redundant set of profiles derived from published collections of experimentally defined transcription factor Description binding sites for eukaryotes Date Yea montrday 201 2 0 Sandelin A Alkema W TEMP Engstrom P ee NR Wasserman Wi Lenhard B 19906716 18006571 Pubmed IDs 14681366 each PMID Inaline Url http fjaspar cob ki sel Email Institution or Company Address Phone Fax Fields with asterisk are required CD m 2 footprintDB and TRANSFAC data formats First we will start explaining TRANSFAC format the most used and standard format for DNA binding data and then we willl explain the unified footprintDB format that allows to store all the binding format in an unique file The following format specifications must be followed to be able to insert data into footprintDB server a TRANSFAC format DNA binding data in TRANSFAC format is usually stored in three separated files first one with TF sequences second one with DNA motifs and matrices PSSMs third one with DNA single sites The three files contain Identifiers and Accessions for each data entry sequences or matrices and they have annotated the relationships among them Besides other information like description organism annotations
13. cite additional datasources as applicable JASPAR CORE http jaspar genereg net PubMed 14681366 RegulonDB http regulondb ccg unam mx PubMed 23203884 21051347 3D footprint http floresta eead csic es 3dfootprint PubMed 19767616 UniPROBE http the brain bwh harvard edu uniprobe PubMed 21037262 18842628 DrosophilaTF http www bioinf manchester ac uk bergman data motifs PubMed 17238282 Athamap http www athamap de PubMed 18842622 DBTBS http dbtbs hgc jp PubMed 17962296 HumanTF http www cell com abstract 80092 8674 28122 2901496 1 PubMed 23332764 HOCOMOCO http autosome ru HOCOMOCO PubMed 23175603 ArabidopsisPBM http www pnas org content early 2014 01 29 1316278111 PubMed 24477691 citations lt footprintdb gt Database insertion 1 Insert a new database into footprintDB If you are not registered create a new account and log in Click on the option Insert database in the User Menu on the left or go directly to http floresta eead csic es footprintdb index php database insert o Stored results o Insert database um o Manage databases o Modify account o Delete account o Log out Fill in all the fields about the new database and enter a file with the data in TRANSFAC of custom footprintDB format These two formats will be explained in the next Section Database Insert Form Name JASPARCORE Version 2009 Input format type
14. domain PF00046 Homeobox domain 2 Search for homologues in a selected proteome You can choose one ofthe available proteomes Arabidopsis thaliana TAIRG v Or upload your own protein FASTA file Name of the organism or the proteome set Proteome file BLAST E value threshold g 91 Allowed formats 0 004 1E 3 CED um Gu Search parameters are the same that in the previous example but in this case we choose to include among the results the subset of Arabidopsis thaliana proteins which are presumably homologous to each of the reported DNA binding proteins Indeed we obtain the same previous results but in a slightly different order with proteins with a higher number of homologues shown first only the first 3 are discussed for brevity 21 footprintDB results for Demo iii bZIP910 JASPAR CORE mTGACGT STAMP Moil footprinDB Binding Pfam MADOS6 1 bZIP910 JASPAR CORE 1 0e 12 7 0077 ACGTCAk Show Show interfaces Show 1111x450 ACGTCAk proteins domains 2009 XBP1_DBD_2 XBP1 HumanTF 1 0 Homo sapiens 9 6e 09 6 85 7 ACGTCAk Show Show interfaces Show Show Arabidopsis thaliana TAIRS wrkGmCACGTCAkc n domains homologues 21 TGA1 TGA1 Athamap Arabidopsis thaliana 2 8e 08 5 6517 ACGTCAk Show Show interfaces Show Show Arabidopsis thaliana TAIRS SAEI taaCGTCAbs aw proteins domains homologues 25 Each row contains a IF that recognizes a motif similar to the query an
15. or literature references are usually included The three files have in common the following header VV Header with library version XX End of field EL Ed Ot venu The DNA motif file has the following structure AC Accession XX ID XXdentrtrer XX NA Main name DE Description BF Binding factor accession Name Species E PSSM BS Binding site data sequence Accession CC Annotation RN 1 Reference number and Accession RX PUBMED Pubmed ID RA Reference Authors RT Reference Title RL Reference Journal Number Issue Pages Year XX ys The transcription factors file has the following structure AC Accession XX LD Xdentitrer XX FA XX oY Name synonyms Separated by XX OS Organisms Separated by XX o0 Sequence XX SCG UnNtprot UNWDEOL LD XX FF Annotation XX MX Motif accession XX BS Binding site accession XX RN 1 Reference numer and Accession RX PUBMED Pubmed ID RA Reference Authors RT Reference Title RL Reference Journal Number Issue Pages XX iy The DNA sites file has the following structure AC Accession XX ID Identifier XX DE Description XX OS Organisms Separated by XX oQ Sequence XX BF Binding factor accession Name Species XX MX Motif accession XX RN 1 Reference numer and Accession RX PUBMED Pubmed ID RA Reference Authors RT Reference Title RL Reference Journal Number Issue Pages XX pil Main name b footprintDB format Year Year
16. ou can choose one of the available proteomes Do not search for homologues default v Or upload your own protein FASTA file BLAST E value threshold Allowed formats 0 001 1E 3 Search Now you might select a species to search for homologues in its proteome or either upload a file with a proteome file in FASTA format and choose a BLAST E value threshold for the Blastp search against the proteome Default 0 01 search for homologues in a selected proteome You can choose one of the available proteomes Caenorhabditis elegans vW5210 60 Do not search for homologues default Or upload your own protein FASTA file Acidobacterium capsulatum ATCC 51186 uid28085 Aquitex aeolicus uid215 Name of the species or proteome set Arabidopsis thaliana TAIRS PLAZA 2 0 Arabidopsis thaliana T AIF Proteome file Arabidopsis thaliana unknown Archaeoglobus fulgidus uidi 04 Bacillus subtilis uid 6 BLAST E value threshold Allowed formats Bacteroides thetaiotaamicron WPI 5482_uid399 Bordetella pertussis uid2b Caenorhabditis elegans 5210 60 Campylobacter jejuni h amp 1T uid 38041 Carica papaya ASGPBz7950 PLAZA 2 0 Carica papaya AsGPFB 7850 i i Chlamydia muridarum uid228 Disclaimer Tr Chlamydomonas reindhardtii JGI4 0 PLAZA 2 0 Chlamydomonas reindhardtii 1014 1 Chlorobium tepidum TLs_uidsde Dehalococcoides ethenogenes 185 uid214 Dictyostelium discoideum unknown v iB 1111 This servi
17. representation or warranty nor assume any liability or responsibility for the data nor the results posted whether as to heir accuracy completeness quality or otherwise Access to these data is available free of charge for ordinary use in the course of research From top to bottom and from left to right Main menu links to Home Database listing Search Keywords Search sequences and Credits sections Sign In menu Authentication form and Registration form link User menu links to Stored Results Insert Databases Manage Databases Modify Account Delete Account and Log out Help menu link to Documentation Section Links menu links to recommended Internet resources 2 Navigation The menus on the left side can be used to navigate across the site Main menu is composed by the following sections e Home Access to home page with general information e Databases Updated information about the databases included in footprintDB e Search Access to search forms o Keywords to search keywords and data accessions or identifiers o Sequences to search DNA motifs and TF protein sequences e Credits Information about footprintDB creators citing data sources and other resources Sign in menu is composed by an authentication form and a couple of links to e Register Access to a registration form for new users e Recover Account Info registered users can recover their account data User menu is only visible for authent
18. 1 Athamap 20091028 MYB2 3 Athamap 20091028 Binding Motifs Binding TFs acaccctAACtgacacacattct Arabidopsis thaliana WYER yCTAACTG MYB vhAAChwm MYB1 Myb like DNA binding domain Myb like DNA binding domain MYB2 Myb like DNA binding domain Myb like DNA binding domain MYB1 2 Athamap 20091028 MYB2 4 Athamap 20091028 M YB2 vhAACbvm MYB2 1 Athamap 20091028 qqgaaAACcaaatccg Arabidopsis thaliana MYB2 vhAAChvm MYB2_2 Athamap 20091028 gactagcAACgccaagtag Arabidopsis thaliana WYB2 vhAACbhbvm agcttaatatatateaG T TAggatatctc g Arabidopsis thaliana W TB44 GTyAGTTASG agcttcaaaaGTTAGTTAcg Arabidopsis thaliana MYB44 GTyAGTTASG tttcqggcacgtct amp ACtgegactggcag Arabidopsis thaliana MYB1 yCTAACTG MYB1 Myb like DNA binding domain Myb like DNA binding domain M YB2 Myb like DNA binding domain Myb like DNA binding domain M YB2 Myb like DNA binding domain Myb like DNA binding domain MYB2 Myb like DNA binding domain Myb like DNA binding domain MYB44 1 Athamap 20091028 MYB44 Myb like DNA binding domain Myb like DNA binding domain MYB44 2 Athamap 20091028 IIl I in MYB44 Myb like DNA binding domain Myb like DNA binding domain If we click in the Accession of any of the results we are shown the individual data for it DNABindingSite ie Accessions MYB1 1 Athamap 20091028 MYB2 3 Atham
19. 67616 UniPROBE http the brain bwh harvard edu uniprobe PubMed 21037262 18842628 DrosophilaTF http www bioinf manchester ac uk bergman data motifs PubMed 17238282 Athamap http www athamap de PubMed 18842622 DBTBS http dbtbs hgc jp PubMed 17962296 HumanTF http www cell com abstract S0092 8674 2812 2901496 1 PubMed 23332764 citations CIOOLDETDtdb xml Vers Lon LO eS SEOODDELWyUUdb lt username gt lt username gt lt input DNA motif name gt test lt input DNA motif name gt lt input DNA motif sequence gt DE la0a_AB Dd oo 39 2 02 0 96 0 0 US 99 9 9 2 04 8 Lo 6 4 05 8 S Jo 39 US uL 2 47 46 OL 2 84 9 XX input DNA motif sequence results summary footprintDB template template common names source STAMP e value Motif similarity footprinDB Consensus Interface residues Pfam domains 5957 PROTEIN PHOSPHATE SYSTEM POSITIVE REGULATORY PROTEIN PHO4 SD ftootorrnt 20130124 1 0e 12 JOD J T7 CCmCGkG Best result for the Q evalue classifier LOST UE OS LE Om Best result fot the TI simil oclassifirer 5957 Im position I results summary protein sequences fasta format gt 8085 1a0a A 1a0a B PHOA YEAST PHOSPHATE SYSTEM POSITIVE REGULATORY PROTEIN PHO4 PHO4 YEAST PHOSPHATE SYSTEM POSITIVE REGULATORY PROTEIN PHO4 MKRESHKHAEOARRNRLAVALHELASLIPAEWKOONVSAAPSKATTVEAACRYIRHLOONGST protein sequences fasta format DNA motifs transf
20. FAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFOKSGMLAVKGKYITIEN Sresult Sserver gt protein_query sequence_name Ssequence SfootprintDBusername unless S result gt fault print Spestit gt Tesuke 15 telsel Print error E y JOID toresulbe rtaubtbeode otesubt ta tskErliHg ys sample regulatory motif sequence Ssequence TGTGANNN possible format Ssequence TGTGA nTGTGG nTGTAG another format transfac format for position weight matrices Ssequence lt lt EOM DE la0a_AB QI a 2990 2 02 0 96 2D OS HD 33 O 04 8 78 6 4 05 09 5 75 8 06 12 47 46 XX EOM Sresult Sserver gt DNA_motif_query Ssequence_name Ssequence SfootprintDBusername unless result gt fault print result gt result Jelsel proint errors y jou gt t 5resulb faultcoode SovesSult oraultstrT gi sample text query Skeyword myb odatdtype tf three alternative search types tf motirt sires Sresult server gt text_query keyword Sdatatype S footprintDBusername unless result gt fault print Sres tit result telsel pront errors C e jor po resuie gt tamtcodeloy result aul string ouch queries generate XML output that can also be programmatically parsed lt xml version 1 0 gt lt EQOTPELAEID gt lt username gt lt username gt lt input protein name gt test lt input protein name gt lt input protein sequence IYNLSR
21. MR complexes that have been independently validated AlQuraishi and McAdams 2011 Lin and Chen 2013 The remaining databases and repositories integrated in footprintDB are 1 JASPAR CORE 2009 version all species redundant set a high quality collection of transcription factor DNA binding preferences modeled as PSSMs Portales Casamar Thongjuea et al 2010 ii UniPROBE Universal PBM Resource for Oligonucleotide Binding Evaluation Sep 2012 version contains in vitro DNA binding specificities of proteins measured with universal protein binding microarrays Robasky and Bulyk 201 1 iii HumanTF sequence specific binding preferences of human TFs obtained by high throughput SELEX and ChIP sequencing It includes a total of 830 binding profiles describing 239 distinctly different binding specificities Jolma Yan et al 2013 iv Athamap genome wide map of potential transcription factor binding sites TFBS in Arabidopsis thaliana Bulow Engelmann et al 2009 vy RegulonDB 7 5 version contains curated data of the transcriptional regulatory network of Escherichia coli K12 including PSSMs and DBSs for many TFs Salgado Peralta Gil et al 2013 vi DBTBS Database of transcriptional regulation in Bacillus subtilis A database of transcriptional regulation in Bacillus subtilis Sierro Makita et al 2008 vii DrosophilaTF Motifs for 56 Drosophila melanogaster transcription factors built from in vitro binding site select
22. NFASDNNVLRAQLAELTDRLHSLNSVLQIASEVSGLVLDIPDIPDALLEP WQLPCPIQADIFQC To start the search please click the Demo button and then the Search button 24 footprintDB Sequence Search Form Search name Demo Email Input type DNA sites or motifs S Proteins that bind DNA Limit number of results per query 40 Order results by E value Query data or file gt bZIP910 JASPAR CORE MASQQRSTSPGIDDDERKRKRELSNRESARRSRMRKQQRLDELIAQESQMQEDNKKLRDTINGATQLYL NFASDNNVLRAQLAELTDRLHSLNSVLQIASEVSGLVLDIPDIPDALLEPWQLPCPIQADIFQC ICE b21P910 Protein Sequence in FASTA format v Organisms nam Homo sapiens Mus musculus Rattus norvegicus Saccharomyces cerevisiae iii gt lt Curated Databases _ gt JASPAR CORE 2009 RegulonDB 7 5 3D footprint 201301 24 UniPROBE 20120919 Lill Pfam domains iii gt PF1 3894 C2H2 type zinc finger PFOO096 Zinc finger C2H2 type PF1 3465 Zinc finger double domain PF00046 Homeobox domain Mi 2 Search for homologues in a selected proteome Dam gt In this search we query a protein sequence in FASTA format In particular we wish to search no more than 10 TFs with similar sequence and their associated DNA binding motifs without filtering organisms nor domains in all available databases We obtain the following results only the first 3 ar
23. RFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLAVKGKYITIEN input protein sequence results summary footprintDB template template common names Source Blast e value Interface identity Interface similarity footprinDB Consensus Query alignment Template alignment Pfam domains 5989 FNR RegulonDB 7 5 3e 39 7 71 1 7 tTGaTywayATCAA 1 62 174 235 Best result for the Q evalue classifier POSI Tnm POSICLON l Best result for the I simil classifier 8472 in position 18 results summary protein sequences fasta format gt 5969 HERLZ20004795 GNE MIPEKRIIRRIQSGGCAIHCOQODCSISQLCIPFTLNEHELDOLDNIIERKKP protein sequences fasta format DNA motifs transfac format DE FNR FNR 01 17 8 10 48 t 02 5 8 6 64 T 03 13 10 58 2 E 04 47 14 8 14 a 05 10 3 13 57 jh 06 15 28 3 35 y 07 28 10 16 29 w 08 48 12 iia 12 a 09 20 25 ial 29 y 10 66 0 11 6 A 13i 4 0 0 79 T ie 0 72 5 6 E 13 80 1 0 2 A 14 He 8 0 2 A XX DE 2isz_B IDER_MYCTU Iron dependent repressor ideR 01 6 6 9 T T 02 9 6 6 TO T 03 TO 6 6 9 A 04 6 9 Rc 6 G Us 0 0 96 0 G 06 0 6 90 0 GXX lt DNA motifs transfac format lt Citaltvons gt footprintDB http floresta eead csic es footprintdb PubMed unpublished Please cite additional datasources as applicable JASPAR CORE http jaspar cgb ki se PubMed 14681366 RegulonDB http regulondb ccg unam mx PubMed 23203884 21051347 3D footprint http floresta eead csic es 3dfootprint PubMed 197
24. ac format DE 1a0a AB PROTEIN PHOSPHATE SYSTEM POSITIVE REGULATORY PROTEIN PHO4 01 i 93 0 2 e 02 0 26 0 0 G 03 58 33 3 2 m 04 8 78 6 4 C 05 8 9 o 8 G 06 RE 2 47 46 k 07 1 Z 84 9 G XX lt DNA motifs transac Lormat gt xJtootprintdb lt xml version 1 0 gt cIOOLDELntdgdb lt username gt lt username gt lt input keyword gt myb lt input keyword gt lt results_summary gt 1272 AtMYB84 Athamap 20091028 2555 TaMYB80 Athamap 20091028 2 128 CAA61021 JASPAR CORE 2009 GAMYB Athamap 20091028 2814 CCA1 ArabidopsisPBM 20140210 lt results_summary gt lt protein_sequences_transfac_format gt AC 1272 AtMYB84 Athamap 20091028 XX FA Myb 84 XX SY AtMYB84 At3g49690 XX OS Arabidopsis thaliana XX SQ MGRAPCCDKANVKKGPWSPEEDAKLKSYIENSGTGGNWIALPOKIGLKRCGKSCRLRWLN SO YLRPNIKHGGFSEEEENIICSLYLTIGSRWSIIAAQLPGRTDNDIKNYWNTRLKKKLINK SQ ORKELOEACMEQOBEMMVMMKROHOQOOIOTSFMMRODOTMFTWPLHHHNVOVPALFRIKP oQ TRFATKKMLSQCSSRIWSRSKIKNWRKOTSSSSRENDNAFDHLSFSQLLLDPNHNHLGSG SQ EGEFSMNSILSANTNSPLLINTSNDNOWFGNEQAETVNLFSGASTSTSADOSTISWEDISSL oU VISDSKOEE XX MX 687 XX RX PUBMED 96290022 RL Romero I Fuertes A Benito M J Malpica J Leyva A Paz Ares J More than 80R2R3 MYB regulatory genes in the genome of Arabidopsis thaliana Plant J 14 273 204 998 e XX Hr protein sequences transfac format citations footprintDB http floresta eead csic es footprintdb PubMed 24234003 Please
25. ap 20091028 Organisms Arabidopsis thaliana Libraries Athamap 20091028 1 1 B low L Engelmann S Schindler M Hehl R amp thaMap integrating transcriptional and post transcriptional data Nucleic acids research 37 D983 6 2009 Pubmed Length 23 Sequence acaccctAACtqacacacattct Binding TFs MYB1 Myb like DNA binding domain Myb like DNA binding domain MYB2 Myb like DNA binding domain Myb like DNA binding domain Binding MYB1 yCTAACTG Motifs M YB2 vhAaACbvimn Publications Urao T Yamaguchi Shinozaki K Urao S Shinozaki K An Arabidopsis myb homolog is induced by dehydration stress and its gene product binds to the conserved MYB recognition sequence Plant Cell 5 1529 1539 1993 Pubmed Hoeren FU Dolferus R Vu Y Peacock WJ Dennis ES 1998 Evidence for a role for amp tMYB2 in the induction of the Arabidopsis alcohol dehydrogenase gene 4DH1 by low oxygen Genetics 149 479 90 Pubmed Download 13 2 Search DNA motifs a Find transcription factors that bind DNA motifs similar to the query If you have a footprintDB account log in first into the website to store your searches and reuse them if you haven t got one registration is recommended Click on the Start to Search button in the Home page click in the Search sequences option of the Main menu or go directly to the url http floresta eead csic es footprintdb index php search o Home o Databases o Search Keywords SEQUENCES ium
26. bases o Modify account o Delete account o Log out A list of the performed searches will be shown Stored results 2011 01 26 13 08 b4IP910 protein sequence in JASPAR database view results reuse search delete search o 2111 01 28 13 08 b4IP910 DNA motif in JASPAR database view results reuse search delete search Recent search results can be accessed by clicking on the view results link Old searches are deleted from the server if you want to repeat one of these searches click on the reuse search link and the search formulary will be filled with the data of the old search Stored results o 2011 01 28 13 08 bZIP910 protein sequence in JASPAR database view results reuse search delete search o 2011 01 28 13 09 bZIP910 DNA motif in JASPAR database view results reuse search 5 Searching through the Web services interface The footprintDB server can be accessed programmatically using a SOAP Web services interface The following Perl source code illustrates how to make protein sequence DNA motif and keyword queries usr bin perl w use Strict use SOAP Lite my SfootprintDBusername type your username if registered my Sresult ssequence ssequence name sdatatype keyword tt vtt tr ttys my S server SOAP Lite VA Mei Looreprlntab zcocpbPoxy htrtpi Tflortesta eesd osrc es Ttoobprintdb wsicgl 3 sample protein sequence sequence name test sequence IYNLSRR
27. ber your account data 4 LogIn Enter User and Password in the Sign In menu on the left and push the Submit button Password Recover Account Into If successful a message will be shown your user name will be shown in red in the top of the left menus and a the User Menu will appear You have successfully logged in please use the left menu to access the advanced features Welcome to footprintDB footprintDB is a database with 3241 unique DNA binding proteins mostly transcription factors TFs and 4713 Position Weight Matrices PWMs extracted from tha literature and other repositories The binding interfaces of most proteins in the database are inferred from the collection of protein DNA complexes described in 30 footprint cts 1 Transcription factors which bind a specific DNA site or motif 2 DNA motifs likely to recognised by a specific DNA binding protein as summarized in the schema E A 5 z 5 Log out Click on the Log out option in the User menu on the left side User Menu o Stored results hModif account gt Delete accoun o Logout You will see the following message You have successfully logged out thank you or using footprintDB and User Menu will hide unless automatic cache is activated in your browser in this case the menu will be visible until any other item is clicked QC la 6 Recover account info Click on the Recover account info link in the Log In
28. ce is availiable AS IS and at your own whether as to their accuracy completeness qui To start to search push Search button 20 Search footprintDB for similar sequences or DNA motifs The database is designed primarily to receive two types of queries INPUT a DNA consensus motif PrM or site OUTPUT a list of DNA binding proteins probably TFs predicted to bind a similar DNA motif INPUT a protein sequence of a putative DNA binding protein OUTPUT a list of possibly recognized DNA motifs If you wantto search for individual entries TFs DNA motifs or sites using keywords please try the Keyword Search Form footprintDB Sequence Search Form Search name Demo Email Input type DNA sites or motifs O Proteins that bind DNA Limit number of results per query 40 Order results by E value v Color results using twilight thresholds Query data or file bZIP910 JASPAR CORE ATGACGT ATGACGT E ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT ATGACGT GEES bZ1P910 DNA Binding Motif in FASTA format Y Organisms Homo sapiens Mus musculus Rattus norvegicus Saccharomyces cerevisiae NP ES Curated Databases gt JASPAR CORE 2008 RegulonDB 7 5 3D footprint 20130124 UniPROBE 20120919 iii Pfam domains PF13884 C2H2 type zinc finger PF00096 Zinc finger C2H2 type PF1 3465 Zinc finger double
29. cripts will retrieve that 3 Manage own databases Log in and click on the option Manage databases in User Menu User Menu o Stored results o Insert database o Manage databases uum o Modify account o Delete account o Log out A list of all your previously inserted databases will be shown Personal databases o 7013 01 01 01 00 MY OWN DATABASE v1 0 make public delete database Two actions are available e Make public if you select this option your database will be public and open access in footprintDB server e Delete dabase you can remove previously inserted databases Bibliography AlQuraishi M and H H McAdams 2011 Direct inference of protein DNA interactions using compressed sensing methods Proc Natl Acad Sci U S A 108 36 14819 14824 Bulow L S Engelmann et al 2009 AthaMap integrating transcriptional and post transcriptional data Nucleic Acids Res 37 Database issue D983 986 Contreras Moreira B 2010 3D footprint a database for the structural analysis of protein DNA complexes Nucleic Acids Res 38 Database issue D91 97 Down T A C M Bergman et al 2007 Large scale discovery of promoter motifs in Drosophila melanogaster PLoS Comput Biol 3 1 e7 Jolma A J Yan et al 2013 DNA Binding Specificities of Human Transcription Factors Cell 152 1 327 339 Lin C K and C Y Chen 2013 PIDNA predicting protein DNA interactions with struct
30. d the motif alignment is also shown as explained in the previous section However Antirrhinum majus Show Arabidopsis thaliana TAIRS homologues 30 the provided red link Show Arabidopsis thaliana TAIR9 homologues allows us to display a list of proteins just beneath the name of each matched footprintDB name footprintDB results for Demo bZIP910 JASPAR CORE mTGACGT STAMP footprinDB PWM Binding Interface Pfam vale am Consensus proteins sequences domains MADOGE 1 bziPatO Antirrranum majus d ACCTCAk Stow Arabidopsis thakana TAR ACGTCAk hoologues OU KAPI DBD 2 XBP1 HurnanTF 1 0 Horno sapiens 90009 oT ACCTCAK ow og a TRY WrkGmCACGTCRk c horabnpers 21 154 TGA1 Athamap Arabidopsis thaliana 28 08 ACGCTCAk Show Show interfaces Show Shen Antiki tains DAFO 20091020 t aaCGTCAbs aw proteins domains homolegues GS footprintDB template Source Organisms bZIP910 JASPAR CORE mTGACGT TOO REDE lala cre UD STAMP Motif footprinDB PWM Binding Interface Pfam p p 9 e value similarity Consensus proteins sequences domains snow A snow MA0096 1 bZIP910 JASPAR CORE 2008 Show Arabidopsis thaliana TAI RQ ihomologues 30 tiva i thaliana TAIRS protein gene paek T Template alignment Interface sequences Related results Protein AT3G62420 sequence CAAT 4022 2 Interface bZIP910 y erkrkRklsNReSArrSRork qq erkrkRmisHReSArrSRrmrkak Protein
31. e shown Query bZIP910 JASPAR CORE Sequence MASQORSTSPGIDDDERKRKRELSNRESARRSRMREKQQORLDELIAQESQMQEDNKKLRDTINGATQLYLNF ASDNNVLRAGLAELTDRLHSLNSVLOIASEVSGLVLDIPDIPDALLEPWOLPCPIQADIFOC Blast Interface Interface footprinDB CAAT4022 bZIP910 JASPAR Antirrhinum 7 21 25 26 16 erkrkRk1sNReSArrSRmrk bZIP91 001 13 B1 022678 ANTMA CORE majus 2822 32 gq 3s GATGACGTGGCm PFO7718 bZIP910 2009 33 Basic region bZIP910 2 leucine zipper GgrTGCTGACGT Athamap 20091028 MA0096 1 MTGACGT CAAT 4023 bZIP911 JASPAR Antirrhinum 8 8 23 26 27 18 erkrkRkqSNReSArrSRmrk bzIP811 1 4 7 67 022877 ANTMA CORE majus 28 30 31 qq 40 GrTGACGTGGCC PFO771 6 bZIP911 2009 34 35 Basic region Athamap bZIP911 2 leucine zipper 20091028 j GrTGACGTGTAC MADO97 1 GrTGACGTGkmC P25032 DNA binding JASPAR Triticum Be 08 10 10 253 256 248 erelkRerRKqsNReSARrSR GMCACGTGdv 248 311 protein EMBP 1 CORE aestivum 257 260 lrkqq 273 PF00170 bZIP EMBP1_VWHEAT 2009 261 263 I transcription HBP 1a 1 Histone 264 265 WWCACGTGGW factor promoter binding 267 268 protein 1 a 1 MA0128 1 WSACGTGG ml We notice that the first result is the query itself query is from JASPAR collection and present in footprintDB and the other results are transcription factors with similar interface sequences results are ordered by Blastp E value and they have annotated also similar DNA motifs 25 Each row co
32. icated users and is composed by the following sections e Stored results Access to a historical record of searches performed by the user Insert database Insertion of user data collections Manage databases Manage user data inserted previously Modify account A form to modify user account data Delete account An option to remove an account Log out Help menu provides links to extensive footprintDB documentation Links menu provides links to our Laboratory of Computational Biology and other related links User registration 3 Registration Click on the link Register in the Sign In menu on the left or go directly to http floresta eead csic es footprintdb index php user register User Password Recover Account Into Fill in the registration form required fields are marked with asterisk and push the Register button to submit the data Registration Form Registration is required in order to store search results and have access to additional features User name Password Email Name Surname Institution or Company Laboratory or Department Country Research area Address City Region Postal code Phone s Fax o Qa You will see the following message if registration was successful User successfully registered you will shortly receive an email with your account information An email is sent to remem
33. ion experiments and compiled genomic binding site sequences Down Bergman et al 2007 2 Annotation of transcription factor interfaces TF sequences in footprintDB have their DNA binding interfaces annotated by means of BLASTP alignments against the 3D footprint library http floresta eead csic es 3dfootprint download list_interface2dna txt with an E value threshold of 10 Aligned interface positions from one or more protein DNA complexes are thus transferred to entries in the database like explained in the following Figure Lysa6 J mns NN a ass mm ersa NL T 9ANT B Rqtytryqtlele lslterqiKIwfQNrrMkwkk 450 400 350 300 250 200 BEEREEEE 0 1 3 5 7 9 11 13 15 17 19 annotated interface length v uv y E vu c ao A TE 2 o o EI Mio aaa 25 27 29 31 21 23 i 2 Annotation of interface residues applying the geometrical rules of 3D footprint A Interface of PDB entry 9ANT which corresponds to Homebox protein Antennapedia in complex with a cis element First inter atomic distances are calculated among heavy atoms of both amino acid side chains and nitrogen bases Second a matrix of interface contacts is compiled Third interface residues are marked as upper case letters in the sequence B Histogram of predicted interfaces in footprintDB transferred from 3D footprint entries through BLASTP alignments 3 footprintDB is a search engi
34. ne The footprintDB search engine is designed primarily to receive two types of queries 1 INPUT a DNA consensus motif or site OUTPUT a list of DNA binding proteins mainly TFs predicted to bind a similar DNA motif 2 INPUT a protein sequence of a putative DNA binding protein OUTPUT a list of possibly recognized DNA motifs DNA to protein DNA binding protein DNA consensus motif or site Protein to DNA Flowchart of the footprintDB search engine Web site navigation 1 Sections footprintDB Nelcome to footprintDB _ ootprintDB is a database with 2422 unique DNA binding proteins mostly transcription factors TFs 3662 Position Weight Matrices PWMs and 10112 DNA Sinding Sites extracted from the literature and other repositories he binding interfaces of most proteins in the database are inferred from the collection of protein DNA complexes described in 3D footprint ootprintDB predicts 1 Transcription factors which bind a specific DNA site or motif 2 DNA motifs or sites likely to be recognized by a specific DNA binding protein As summarized in the schema DNA to protein o Insert database age databases DNA binding 2 3 DNA consensus protein EN motif or site S 7 gt Computational Biology Protein to DNA o TFcompare o 3Dfootprint o gRliperl bioinfo Blog Disclaimer 000000000000 These data are available AS IS and at your own risk The EEALYCSIC do not give any
35. ntains a TF with similar sequence to the query Predicted DNA binding residues are shown coloured in the interface sequence and all the DNA motifs annotated for that TF are shown Left clicking on the Blast E value or the interface similarity score will show the alignment of the footprintDB TF sequence with the query Left clicking on the footprintDB template TF accession name will display the full information about the TF footprintDB template some Organisms Antirrhinum majus GAAT AD22 1 BZIPSTU 022676 l bZIF 910 JASPAR CORE 2009 Athamap 20091028 Transcription Factor Accessions CAA74022 JASPAR CORE 2009 bZIP910 Athamap 20091028 Names 022676_ANTMA bZIP910 Organisms Antirrhinum majus Libraries JASPAR CORE 2008 1 Athamap 20091028 1 Sandelin 4 Alkema VV Engstr m P Wasserman WA Lenhard B JASPAR an open access database for eukaryotic transcription factor binding profiles Nucleic acids research 32 D91 4 2004 Pubmed B low L Engelmann S Schindler M Hehl R amp thaMap integrating transcriptional and post transcriptional data Nucleic acids research 37 D983 6 2009 Pubmed Length 133 Pfam Domains 13 61 Basic region leucine zipper 17 75 bZIP transcription factor Sequence 1 MASQORSTSPGIDDDERKRKRKLSNRESARRSRMRKQORLDELIAQESQMOEDNKKLRDT 60 a ana 61 INGATOLYLNFASDNNVLRAQLAELTDRLHSLNSVLOIASEVSGLVYLDIPDIPDALLEPW 120 121 QLPCPIQADIFQC Interface 21 25 26 28 29 32 33 Residues
36. ntially bind a hybrid C box G box motif and help to define a new sub family of bZIP transcription factors Plant J 13 489 505 1 998 Pu ini ed Other data shown are the source organism s Pfam domains the set of interface residues which are the key residues mediating specific DNA recognition Blastp E value and interface similarity score b Find in a selected proteome homologous transcription factors that bind TF sequences similar to the query Please follow the steps explained in the former section Find transcription factors with similar sequences until you see the search formulary The menu Search for homologues in a selected proteome will be available at the bottom of the page Then follow the same procedure explained in the previous section Homologous protein sequences from the selected genome will be shown and they can be accessed as previously explained bZIPS10 JASPAR CORE Sequence eps A A E mentado Toss pa ni EPA ES uum m mn a 2 29 8 le erkrkRkISNReS5ArTrTSRmrk BE ST DET ied E 28 99 92 ato GATGACGTGGCm Pro barng a joi regio LN sog on rabies TAI araia CX GgrTGCTGACGT sentra mTGACGT Frobur ATA frome wh T ria ecc He SAC Hot qx 2l 4 Retrieve stored searches Registered users can access to a list of stored searches Log in and click on the Stored results link in the User menu on the left side o Stored resultz uum o Insert database o Manage data
37. ociated to an specific organism Databases Select databases or sources to restrict the search Multiple databases can be selected pushing the Control key Pfam domains Select protein Pfam domains to restrict the search Multiple domains can be selected pushing the Control key To start the search please click the Demo button and then the Search button footprintDB Keyword Search Form Entry type Sa Transcription Factors DNA Binding Motifs DNA Binding Sites Search text Hyb EE se Mame v Organisms Homo sapiens blus musculus Rattus norvegicus Saccharomyces cerevisiae Curated Databases Fa JASPAR CORE 2009 RegulanDB 7 5 3D footprint 20130124 UniPROBE 20120819 Pfam domains C2H2 trpe zinc finger Zinc finger C2H2 type Zinc finger double domain Homeobox domain The former search will look for the word Myb in the database obtaining multiple results that we can expand clicking on Show results Database search results 18 Transcription Factors show results 28 DNA Binding Motifs show results 6 DNA Binding Sites Show results unn A full list of the results will be shown with a short summary of them and the option to access them individually or download them 12 Database search results 18 Transcription Factors show results 28 DNA Binding Motifs Show results 6 DNA Binding Sites Show results Accessions Sequence Organisms MYB1
38. ponding footprintDB DNA motif information footprintDB results for Demo Query bZIP310 JASPAR CORE mTGACGT s TAN STAMP B nu PM Binding interface seis irat pud IEEE aili ee nish iis MANIAR 1 BFPO JASPAR CO 1 00 12 Taos ACCTCAk faci sd ACCTCAk 0007 l BEIP C Ris ASPAR CORE 8 10 ANIR ACGTCAk ring sutil 2nnn ACOTA IBRD 2 MEP Umanir 1 0 Hore fap lens ERECAN basi REC TORK wWrkGmCACeTCAke DNA Binding Motif Accessions MA0096 1 JASPAR CORE 2009 Names bZIP910 Organisms Antirrhinum majus Libraries JASPAR CORE 2009 1 1 Sandelin amp Alkema VV Engstr m P Wasserman WWW Lenhard B JASPAR an open access database for eukaryotic transcription factor binding profiles Nucleic acids research 32 D91 4 2004 Pubmed Length 7 Consensus mTGACGT Weblogo PSSM PO A C G T 01 15 15 5 D m 02 D D 35 T 03 0 D 33 D G 04 35 0 0 0 A n5 0 35 D D C 06 35 G 07 0 35 T Binding TFs CAA74022 Basic region leucine zipper bZIP transcription factor Publications Martinez Garcia J F Moyano E Alcocer M J C Martin C Two bZIP proteins from Antirrhinum flowers preferentially bind a hybrid C box G box motif and help to define a new sub family of bZIP transcription factors Plant J 13 489 505 1 998 Pubmed In the same way left clicking on the TF accession name in Binding proteins or Interface sequences columns will show the full information page for
39. predicting which TFs that might recognize input DNA motifs footprintDB predictions can be extended to external proteomes to design DNA binding experiments for the desired organism footprintDB database consists of a collection of curated and annotated DNA binding data which is obtained from literature and public repositories and stored in a database Among these data are the protein sequences of the TFs their DNA binding sites DBSs and their Position Specific Scoring Matrices PSSM that summarize the binding preferences together with their Pfam protein domains literature references and the set of annotated DNA binding protein interface residues footprintDB features are described in more detail in the following sections 1 footprintDB is a repository of databases Current online release of footprintDB contains 2422 unique TF sequences 3662 PSSMs and 10112 DBSs footprintDB is by design a meta database of TFs attached to their experimentally determined DNA binding preferences PSSMs and DBSs Therefore it does not incorporate other databases which contain only TF DBS or predicted regulatory sequences The first building block is 3D footprint Contreras Moreira 2010 a database for the structural analysis of protein DNA complexes for two reasons i it is to our knowledge the only up to date source of annotated binding interfaces of TFs and ii it contains structure based PSSMs motifs inferred from cis elements captured in X ray and N
40. the TF footprintDB results for Demo Query bZIP910_ JASPAR_CORE mTGACGT CERDO Organisms EINE DOES Binding proteins Interface sequences E template g e value similarity Consensus gp a domains MADO96 1 JASPAR Antirrhinurn 1 0e 12 ACCTCAk Show proteins Show interfaces Show bZIP910 CORE 2009 majus ACGTCAk domains CAAT 4022 CAAT 4022 0225786 ANTMA n kRk1SNReSArrSRmrk CAAT 4022 17 transcription factor 13 61 PF07716 Basic region leucine zipper 18 Transcription Factor Accessions CAA74022 JASPAR CORE 2009 bZIP910 Athamap 20091028 Names O22B75 ANTMA bZIP910 Organisms Antirrhinum majus Libraries JASPAR CORE 20091 Athamap 20091028 1 Sandelin 4 Alkema VV Engstr m P Wasserman WWW Lenhard B JASPAR an open access database for eukaryotic transcription factor binding profiles Nucleic acids research 32 D91 4 2004 Pubmed B low L Engelmann S Schindler M Hehl R amp thaMap integrating transcriptional and post transcriptional data Nucleic acids research 37 D983 6 2009 Pubmed Length 133 Pfam Domains 13 61 Basic region leucine zipper 17 75 bZIP transcription factor Sequence 1 MASQORSTSPGIDDDERKRKRKLSNRESARRSRMRKOORLDELIAQESOMOEDNKKLRDT 60 mm 61 INGATOLYLNFASDNNVLRAQLAELTDRLHSLNSVLOIASEVSGLVLDIPDIPDALLEPW 120 121 QLPCPIQADIFQC Interface 21 25 26 28 29 32 33 Residues Binding Motifs bZIP910 1 GATGACGTGGCm bZIP910 2 Ggr TECTGACGT J Martin C Two bZIP proteins from An
41. tirrhinum flowers preferentially bind a hybrid C box G box motif and help to define a new sub family of bZIP Publications Martinez Garcia J F M or M J C Ma 998 Pubmed Garcia J F Mo vano E Alcocer transcription factors Plant J 13 489 505 1 Download Other data shown are the source database organisms Pfam domains the set of interface residues which are the key residues mediating specific DNA recognition STAMP E value and DNA motif similarity score sum of the Pearson correlation coefficients of the aligned DNA motif positions b Find in a selected proteome homologous transcription factors that bind DNA motifs similar to the query Please follow the steps explained in the former section Find transcription factors that bind DNA motifs similar to the query until you see the search formulary The menu Search for homologues in a selected proteome will be available at the bottom of the page Organisms Homo sapiens Mus musculus Rattus norvegicus Saccharomyces cerevisiae Curated Databases JASPAR CORE 2009 HegulonDB 7 5 3D aetprint 20130124 LUniPROBE 20120919 Pfam domains C2H tvpe zinc finger Zinc finger C2H2 type inc finger double domain Homeobox domain Search for homologues in a selected proteome q Click on the title Search for homologues in a selected proteome to expand the homologue search options 19 search for homologues in a selected proteome
42. ural models Nucleic Acids Res Portales Casamar E S Thongjuea et al 2010 JASPAR 2010 the greatly expanded open access database of transcription factor binding profiles Nucleic Acids Res 38 Database issue D105 110 Robasky K and M L Bulyk 2011 UniPROBE update 2011 expanded content and search tools in the online database of protein binding microarray data on protein DNA interactions Nucleic Acids Res 39 Database issue D124 128 Salgado H M Peralta Gil et al 2013 RegulonDB v8 0 omics data sets evolutionary conservation regulatory phrases cross validated gold standards and more Nucleic Acids Res 41 Database issue D203 213 Sebastian A and B Contreras Moreira 2013 The twilight zone of cis element alignments Nucleic Acids Res 41 3 1438 1449 Sierro N Y Makita et al 2008 DBTBS a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information Nucleic Acids Res 36 Database issue D93 96

Download Pdf Manuals

image

Related Search

footprintDB footprintdb

Related Contents

  PC70THD  Denon AH-NC600  取扱説明書 - Panasonic  E Piano Sicurezza - Istituto Comprensivo G. Macherione  

Copyright © All rights reserved.
Failed to retrieve file