Home

Exercise 1

image

Contents

1. Mini TAAABAACGATCTTTAAATATTACATTCTG SN RS POOL Te IL X X o do d NOR NOR We 30 GE SE SM SEs Re N Click and drag with the cursor here to manually edit 3 5 H P CDS 386 1858 Pfam hit to PF00835 SNAP Z5 famil CDS 3837 5073 202 JU rJ CT i ann Workshop on Bacterial Genomics Module 1 Artemis 28 Workshop on Bacterial Genomics Module 2 Comparative Genomics Module 2 Comparative Genomics Introduction The Artemis Comparison Tool ACT also written by Kim Rutherford was designed to extract the additional information that can only be gained by comparing the growing number of genomes from closely related organisms ACT 1s based on Artemis and so you will already be familiar with many of its core functions ACT 1s essentially composed of three layers or windows The top and bottom layers are mini Artemis windows with their inherited functionality showing the linear representations of the genomes with their associated features The middle window shows red blocks which span this middle layer and link conserved regions within the two genomes above and below Consequently if you were comparing two identical genome sequences you would see a solid red block extending over th
2. 10 0 0 0 0 0 0 0 0 TTTCGTCCTGCGACGTGACCGTAATAGCAGACAGGTATGGCCTGACTGACATCGCCTGTTGTCTTGTTATGACTATGAAGGCCGATTGTCGTAAAAC 8S NOD OD 1 W WV B S8 V T 8 h b V 1 9 b h Lb V B Q LC NM 5 85 X PR V Vv B C Y Q Y S V Q A Rod Ww HR X cR OW S Q Lb B C LU E LH d ERNA 620 683 possible truncated tRNA Phe misc_feature 620 134181 The major Vi antigen pathogenicity island SPI 7 CDS 761 1935 Weakly similar to the C terminus of several polysaccharide bic CDS 1792 3156 Similar to Bacteriophage P1 Ban helicase 080281 EMBL AJ011 misc feature 2022 2445 PS00017 ATP GTP binding site motif P loop CDS 3149 4948 no significant database hits CDS 5422 Doubtful CDS CDS 5550 6131 no significant database hits CDS 6216 6773 Weakly similar to Yersinia pestis orf 77 092381 CDS 7018 8361 no significant database hits We are going to look at this region in more detail and to attempt to define the limits of the bacteriophage that lies within this region Luckily for us all the phage related genes within this region have been given a colour code number 12 pink We are going to use this information to select all the relevant phage genes using the Feature selector as shown below and then to define the limits of the bacteriophage First we need to create new entry clic
3. ut TNT ETENINNMINM D NET 1 i 4 1 F gt copes branch ee Pee tatag FSECLIMAd IE wii Drain nir raimin misc futurs Details of protein domains defined by Pfam Interpro PRINTS SMART PROSITE TIGRFAM with links to annotation of these families Gene Ontology associations Links will take you to the descriptions of the terms as well as other proteins annotated to the same ontology Through this exercise you should have seen that when searching by geneID description the gene is found when using PfHT monosaccharide transporter but not when using sugar transporter glucose transporter or fructose transporter is a gene name and because this has been annotated into the database from the literature it is detected in the database The other search terms are descriptions of the product of the gene and although the protein can transport fructose and hexose it 1s described as a monosaccharide transporter in the database This 1 just an example of how the way in which a gene is described can affect the results of simple searches It is always better to try several search terms and compare the results Doing a full content search will search all of the annotation fields associated with a gene Thu
4. CDS PELO 0355 ADINA TP branngater aceery labe trachea F bir CDs iani putative PEELS F Cos aranipornicr putative CDS MALAPLI12 neckcoskie transporter putative chasini 33 P CDSs 25 palatine F falciparum partadive T ome pan d 132 Workshop on Bacterial Genomics Go Navigation bar pull down menues ii IE wwe gered ong gender b pn mala inama PT ina ER 5 You can navigate between different DB CDS PFB0210c 5 organism datasets and search tools Sau oyma seam using pull down menus JJTIGR Square anal emnetation provided hs VIR lhi gene J Jom Ra General Uf Rear remate Gene name and product information The Name a samane nme description lines are standardized and indexed so m EM oi that features sharing the same description lines can Locis be retrieved Access to the nucleotide and amino Re ipio impia acid sequences of the feature are also provided Guphxal Display cn Artemis Context mo PEMA BIZ EEO PEMA Basic location informa
5. E REWVETHRGOIKXE a n LTREKHBNhHRERRFELTHHLHZGBSYyS M op el E iL AR E BR Kk KE H KR C f 8 A DOT EAR A OT ROS OL V H FE mrin E Hii BIB LII 12080 tt i 8 d nit dii GN Porere out m 1 ny et ien rn A H 10 16 24 amp T DB Phatl 1005 362 Phata 567 Fhati 11155 c Phab alternati 15618 20610 15515 bun T 134i Hn Example location 15618 20618 in Pknowlesi contig embl Tig ABBATTUDTTTTOTTTUCTTTTTTTTTTTUA nACTOOR BALTAR FY E i1 r v ES F L s F F FP F F F E TL 8 S E oL RE AR LHF YR E YT T LT EF FF fF TR LD POT F gt P L uk BH 3 3 40 Automated gene prediction for hypothetical gene phat4 Can you curate the Phat4 gene model and suggest any alternative splicing pattern such as the red model Workshop on Bacterial Genomics Module 2 Comparative Genomics Exercise 3 Introduction Having familiarised yourselves with the basics of ACT we are now going to use it
6. V I i DT LT cp t E E E Ls r 2 gt E E P E F a 1 E E E L L F Ex E P L F H F E L F H MGATTACOTCTOS moe ATCATAACAGGS ANAT SGATTGAAAATAAATATATOGO CAS CAGCACKTGAACANG TTUGAMUTOATCMTTTAAMAATTTATTOACTIAOUOHOO AGATACTTTAACCA Mot 1 STOTACTTOTTCAANGS ACHIGFTIAATTTETAMATARC TGAATCORCCOSTCTATS AMAT TOSI L T P E 1 E D ET T H d LI E i i b p E E b E E B i r x B PF E z d a a M E LM y F fh au A rH p H NH W hb A B L Y EK I Notice how several of the plots show a marked deviation around the region you are currently looking at To fully appreciate how anomalous this region 15 move the genome view by scrolling to the left and right of this region The apparent unusual nucleotide content of this region 15 indicative of laterally acquired DNA that has inserted into the genome 13 Workshop on Bacterial Genomics Module 1 Artemis As well as looking at the characteristics of small regions of the genome it 1s possible to zoom out and look at the characteristics of the genome as a whole To view the entire genome
7. bet be PROS C 5 Cirma FEGGIIE Fears sabios pect 1500135 pix DE Deerription BRIT domm CDS BB0419 DB E 979 Orgasm Soto 2 General Wawra Systematic Name Preis seame ce phie tke protease Type s Sequence DNA and Proven Location Cimone 1 Contig BB 413019 Leagh 2796 tp Exons 432619 Spiced length 2796 be Graphical BRONS EBNOS RAMIO RAMIS EROS BROUT EEU Primary Predicted Peptide Properties Mass 5 Amina andi 91 Itaolsetrie EEE Crp 20 Sagnal Not found Transmembrane Domains Otani ov DE Ares EMP fae 800082 secer 2 3e 10 tended Man p Tis hi extended beyond the end cf the ftre by an and was wed was DE Description Note tke le 123 Workshop on Bacterial Genomics
8. cns 927 12150 beta mbuln T brucei cns 15927 12360 aipha mbum T brucei cns TALLY beta T brucei CDs 15227 12380 T brucei cns 15927 1 2390 betamibulm T bruci CDS 1522712200 alpha ndun T bruce 72722200 complex patate 102250 96 Workshop on Bacterial Genomics Exercise 1 4 Exploring GeneDB AmiGO browser further Organisms _ Contact curator 270166 General Information Pres Name Systematic Name Tb927 2 2900 Gene Synonyms 1008 250 Status role inferred from homology Products ARP2 3 complex subunit putative 5 others Type CDS Sequence DNA and Protein Location Chromosome 2 Chromosome Location 554106 554648 Length 543 bp Graphical Display in Artemis Genome Browser Context Map Em 550000 555000 560000 5650 15921 2 2830 T9327 2 2860 10327 A 2880 gt Tb327 2 2900 lt Tb927 2 2910 Tb927 2 2920 16927 2 2340 T9327 2 2950 T0327 2 2370 Primary Annotation ARP2 3 complex subunit 4 curated by Wickstead Univ of Oxford Predicted Peptide Properties Mass 20 7 kDa Amino acids 180 Isoelectric point pH7 9 Charge 3 0 Signal Peptide Not found Transmembrane Domains 0 found GPI Anchor Not found Protem Map NC Protein E Pfam Domain Information DB Aces Description Note Pfam 05856 ARP2 3 complex 20 kDa subunit ARPC4 Residue 1 177 Scor
9. wap d us gd au en Vc 800 p600 10400 11200 12000 12800 13600 14400 Ei uda EMEN E UE OO 000 0011 101 00 UE SP dU RU 00 cos BENE 11 MUI M IOI 1 d d OB I BIN DII TURON LONCICYONSIBSLOGCOZSICBOCACDOECHOTTOBSRBSSOYUECY ECSCSURCOIOY TOR ey ean ne ee gt 4 ar 10960 10980 11000 11020 11040 11060 C CI er ar a Li GNSLRVYCSCIYMWIMNYIIASANCWVCGEAA2R2S0DDRIYVYS source 1 21515 cos 1 1028 cos 2465 3682 cos 4594 8139 This part of the exercise has made you aware that you run use the Artemis applet from within GeneDB to view the detailed structure of the gene within its genomic content Annotation 1s at different stages for different genomes and 15 actively improved where genes are manually curated There will be occasions where the annotation may be misleading incomplete or not as comprehensive as it could be If you have any comments or about a particular gene s annotation or can provide data that you think improves the knowledge base then don t hesitate to contact the curator for that organism via email Responses are normally provided within one or two working days If your query or suggestion is of a technical nature or is someth
10. ES ST D p misc feature c fea misc feature isc featu m 3 800 1600 2406 3200 4000 5600 6400 7200 misc feati pares e n dedu ec n M NE E GEL PH PINE PEE OLEI TNT E sl v D Lh i1 Lh 9S9 W K L G Y E V C P C L PI G 8 HG 8 EF NBI 5 dm dE p STY0005 STYOOO6 4 0 2760 2770 2780 2790 2800 2810 2820 2830 2840 ACAAACGGCTAGACAATGCCTGGGAGAGTACCTTCAATCCTCAAATTGTACCACTTTCACATACGGGGCCGAAGGTCGCGCTTGTACTCGCAGCCAA 0505 15 14 010014015004 10 45 20 19 7 Se dmn 195 05 wy dS 0 dE o dub dm o 36 JE dh o H o W deb dol dy dsb 1 o 9 o eo VASE ll CDS 190 255 Orthologue of E coli thrL LPT ECOLI Fasta hit to LPT ECO CDS Em 2199 Orthologue of E coli thr 1 _ ECOLI Fasta hit to Eo misc feature 343 369 PS00324 Aspartokinase signature misc feature 2314 2382 some SE dehyd enase signature 3730 of coli thrB KHSE ECOLI Fasta hit to 8 5 misc feature 3068 3103 2300627 GHMP putative ATP binding domain CDS 3734 5020 Orthologue of E coli thrC
11. Buclebtide substitutlsas Fmares a lasnr 1 Tp3lT 43 440 fn testMasscitg of TPE and TPRISAI meee call T Windows Save Web Page Save in Desktop pony My Documents g My Computer History isk My Network Places t License Desktop File name embi Save as type Text File 1 x Cancel Encoding Westem European 150 My N ork P Workshop on Bacterial Genomics Options Select a file ser path or folder name Quit nfs t8mg1 chf tigr Filter Files Genome Research 5 setup tar append test Folders 5921 2 v 1 STEER 01 07 2004 tar gz xmls with auto nnot Dec 3 tar gz failures TIGR annot test embl fasta VA em tigrxml dtd Sanger tigrxml jar tread xml properties Teruzi_Jan04 Tcruzi no uto nnot Enter file name Tb927test embi Open Update 1 ile Entries Select View Goto Edit Create write Run Graph Display features bases 1032 AUTE TR LE EHE HE TERUEL EE ELI LEE E gg mg dE on i d 9 ap variation variation 000 gan 002 5200 C z A 1
12. gt e gt lt gt 4 Organism gt gt Organism m A P anserina t gt lt Organism qut Bat qut But What are these genes ij qut qut AG Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST Module 3 Generating ACT comparison files using BLAST Introduction In Module 2 you used ACT to visualize pairwise BlastN or TBlastX comparisons between DNA sequences In order to use ACT to investigate your own sequences of interest you will have to generate your own pairwise comparison files 15 written so that it will read the output of several different comparison file formats these are outlined in appendix II Two of the formats can be generated using Blast software freely downloadable from the NCBI appendix X Both Windows and Linux versions of the software are available which can be loaded onto a PC or Mac For the purposes of this module the NCBI Blast distribution software has already been installed locally and therefore ready to use To give you an idea of how easy it is to download and install the software on a PC we have included a step by step guide in the appendixes Appendix X The example shown in appendix X 15 for downloading onto a PC with Windows XP The exercises in this module are based on the Linux version of the Blast software Although the oper
13. Drop down menus These are mostly the same as in Artemis The major difference you ll find 1s that after clicking on a menu header you will then need to select DNA sequence before going to the full drop down menu This is the Sequence view panel for Sequence file 1 Subject Sequence you selected earlier It s a slightly compressed version of the Artemis main view panel The panel retains the sliders for scrolling along the genome and for zooming in and out The Comparison View This panel displays the regions of similarity between two sequences Red blocks link similar regions of DNA with the intensity of red colour directly proportional to the level of similarity Double clicking on a red block will centralise it Artemis style Sequence View panel for Sequence file 2 Query Sequence Right button click the Comparison View panel brings up this important ACT specific menu which we will use later 3 Workshop on Bacterial Genomics Module 2 Comparative Genomics 1 St dna vs Eck12 dna 1515 File Entries Select View Goto Edit Create Write Run Graph Display LE Ri ht tton 1 E bu m in i iM lu il click here m 1600 2400 3200 4000 4800 5600 6400 2555 LE dI I LUE ME E EINE LN
14. Tbrucei IEA actin like protein putative Tb927 1 2330 gene GeneDB_Tbrucei ri Tb927 1 2340 gene GeneDB Tbrucei TbS927 1 2350 gene GeneDB_Tbrucei Tb927 1 2360 gene GeneDB Tbrucei TAS alpha tubulin All Gene Product Associations Get ALL associations here Associations With Terms Submit Query Datasource Evidence Code Species S avermitilis Cer ap S coelicolor Submit Query 98 Workshop on Bacterial Genomics 5 Last updated 2005 08 28 structural constituent of cytoskeleton Accession GO 0005200 Aspect molecular function Synonyms None Definition The action of a molecule that contributes to the structural integrity of a cytoskeletal structure Term Lineage all all 6457 GO 0003674 molecular function 6451 GO 00051968 structural molecule activity 356 GO 0005200 structural constituent of cytoskeleton 49 External References InterPro 3 Pfam 2 PRINTS 1 PROSITE 2 SP_KVV 1 All Gene Product Associations Get ALL associations here Au Associations VVith Terms Submit Query Filter Associations Datasource Evidence Code Species All FlyBase AANA aeolicus IMP JA fulgidus x Submit Query Gene Symbol Type Datasource Evidence Full Name structural constituent of cytoskeleton 1 gene SGD TAS actin T 18 SGD TAS None ARCA0 gene SGD TAS None T
15. Primary Annotation predicted by codos pream irode gimmer hemes 312564 313803 105 Workshop on Bacterial Genomics 4 You could now go through each one of the 7 putative members of the T brucei Arp2 3 complex identifying putative orthologues by looking at the Orthologues section on the gene page There is however a faster way using the List Download utility This function allows you to compile a list of your gene of interest and then subsequently downloading the description and sequence of these features using the Gene Basket Start by going to the top of the gene page Imagine this to be your first gene of interest In order collect your genes of interest you ll need to click on the Add to Basket icon at the top of the page This will now have added the identifier of this gene to the virtual basket DB CDS Th927 2 2900 DB Jindi rs al T EL 1 lenem nini up tien JGR Te pee bui essen It rr LT grues Comeau hm D in Comte _ E Emm 0 M ja MEE Change Now go to each one of the gene pages of the putative Arp2 3 complex members they are Tb10 70 2680 Tb10 61 0500 Tb10 389 0270 Tb10 406 0320 Tb927 2 2900 Tb927 8 4410 Tb09 160 3850 and should all be listed in the table you filled
16. gene SGD None T gene SGD TAS actin related protein ARP3 SGD TAS None T ASKI gene SGD None T gene SGD None gene SGD None Exercise 1 5 The use of BLAST to identify sequences with similarity to known Arp2 3 complex subunits 99 Workshop on Bacterial Genomics 1 DB CDS Tb10 61 0500 Go To GeneDB Search Simple Help Contact curator fumigatus 5 cerevisiae Name Systematic Name Tb10 61 0500 Warming Tempora D discoideum L major Status role inferred from homology P berghei Products actin related protein 2 ARP2 putativ chabaudi P falciparum Type CDS Sequence DNA and Protein Location Chromosome 10 Contig Location 559543 560745 Length 1203 bp ATGGCTTCGT TTAATGTTCC TATGCTGGGA ACGATGCTCC GGTGCTTCTT CAGGGCCOGC GGGCATCTCT CAAGCAAAAG CGTCTOCTOG ATATGGAGCG AGATTATOTT TIGACTCTGG Send to omuBLAST Send to BLAST CATTATAATG GGCAACTGAG ATATICTTTG TTTGACCGAA TOMGTOGTTC CTCTTGGACA TGOATGGCGTT GACAATGOCA CAATGAACCT ATATGCATCT CCTCOCACCT TCCOGAACTT TARARACATT AAGAGATCTC TAAAAGTOCT GTTTOGCGGC AGATTACGAA TTAA Send to omuBLAST Sendto BLAST TTCCCCACTS AAGCCTTCTT GACTATOCTA CAGTCTC
17. i i ASTD qr Plu mei a gurr ier pulldowm ram helio the peel hanna will wai B om aami hack ap uw the hareu You are currently searching P berghei and P and P falciparum Choose a different organism ll Choose a boolean condition or select a query AND OR by de Sanco boi ws your on In the second pull down menu select the option Proteins with a predicted GO component Then click on the proceed to next step button circled red The screen should appear as below You are currently searching P falciparum Choose a different organism BAND Hi Proteins with annotation matching a particular keyphrase Keyword Hi Proteins with a predicted GO component 26S proteasome amp 6 phosphofructokinase complex acetyl CoA carboxylase complex actin Capping protein Complex actin cytoskeleton GO Component 4 actin filament alpha DNA polymerase primase complex alpha ketoglutarate dehydrogenase complex sensu Eukarya anaphase promoting complex apical complex B Query options Rows per page 20 Run query Submit Reset Hosted by the Sanger Institute Send your comments on GeneDB 135 Workshop on Bacterial Genomics You are currently searching P falciparum Choose a different organism Bano W Proteins with annotation matching a particular keyphrase Keyword W Proteins with a pre
18. 11802 11202 Majority af m UN ER ARN variatinn 11805 Majority of Wen eir i an deri 11H36 165 Majority of MCS ad Set blasts Tat Fut Sling acraplor ait Bai Feature i c wplics acomptor wit gene MAA ac Fuarurm 5 1500 aceptar vite 1 gena Hajority af call Gee TEMALT maat rada C 113 Workshop on Bacterial Genomics ClustalX 1 645 Alignment Trees Colors Quality Help File Edit Multiple Produce Guide Tree Only Do Alignment from Guide Tree PGKB_ Realign Selected Sequences 1 Realign Selected Residue Range Align Profile 2 to Profile 1 Align Profiles from Guide Trees Align Sequences to Profile 1 Align Sequences to 1 from Tree Alignment Parameters Save Log File mier Output Format Options 242 TEE MERE TERES 130 If you haven t got access to Artemis installed in the Unix Linux environment then you could always run the alignments using clustal via the web http www ebi ac uk clustalw 114 Workshop on Bacterial Genomics Exercise Boolean querying By now you will have familiarised yourself with a variety of tools to search and browse the data housed in GeneDB An additional query
19. Descginn Vae Algeria Ter ABTS prm 1 2E Uia 1 03 cann ZA EA Gla Workshop on Bacterial Genomics LC pe ll NEM p34 Arc SPACOFO 10c PF04045 p21 Arc Tb10 70 2680 PF04062 SPBCI1778 08c p20 Arc SPAC6G9 07c Tb927 2 2900 PF05856 16 SPAC17G8 04c PF04699 Using 4 different approaches to retrieve identify putative homologues you should have completed this table As you will have noticed you probably wouldn t have been able to retrieve all the data by just using a single approach to mine the T brucei genome which highlights some of the issues outlined in the introduction to this module Exercise 1 6 Identify the Arp2 3 complex in other kinetoplastid species Imagine now that you are not only interested in this complex in T brucei but also in other Trypanosoma and Leishmania causative agents of Leishmaniasis species GeneDB 15 ideally suited to this purpose as it houses sequence and annotation of multiple kinetoplastid species and the data are extensively cross linked You could start by identifying components in L major and T cruzi and then move on to infantum and the cattle infective T vivax and T congolense There are a number of ways you could tackle this problem You could use similarity searches GO and or Pfam catalogues similar to what you have been doing in the previou
20. 89 180 b RP 78000 97500 4 TIK Query Flipped Ex tet 0 DP gt DDE M PP PP gt E ree b gt ee 156000 136501 17000 07500 58500 39000 19500 4H 40 dH 4 lt al 941 gt 175500 18000 a 44 4m 4 4 4 4 4 4 4 4 4 4 The result of the BlastN comparison shows that there regions of DNA shared between the plasmids pHCM1 shares 169 kb of DNA at greater than 99 sequence identity with R27 Much of the additional DNA in the pHCM1 plasmid appears to have been inserted relative to R27 and encodes functions associated with drug resistance What antibiotic resistance genes can you find in the pHCMI plasmid that are not found in R27 The two plasmids were isolated more than 20 years apart The comparison suggest that there have been several independent acquisition events that are responsible for the multiple drug resistance seen in the more modern 5 typhi plasmid 54 Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST Exercise 2 In the previous exercise you used BlastN to generate a comparison file for two relatively small sequences gt 500 000 kb In the next exercise we are going to use another program from NCBI Blast distribution megablast that can be used for nucleotide
21. Aenotanon E E30 T bres ABC protem reactance potat Asnotaten 4 1030 T hrueet mulidnug renatsee anecsated proben putative Aenotanon 5 Det 210 T bruce Iypossencal prose conserved Manual Annotation 8 TRMEPA 3112 tbi T bruce mundnug renstance proben Manual Asnotahon T 1650 4600 T bruce ABC putas Masa Aenotanen 8 Te 20020 830 T bruer ABC Aznotaten 9 1 51 0340 T brace transporter Asnotahon 10 Th k 15 2900 T tranrporter putative Manual Annotation n Te MLEK T5 3320 T bruci ATE berdng cassette proben putare Annotation 12 T mrantpoeter pr tim Martial Aenotahen 13 RLI 05230 brocat rixeucieair T Aenotanon 14 SFS 330 bruce ABC Asnolakon 15 2 1 720 T rer multidrug renstieer poten Asnetation 16 26A17 190 T brucei hyposketical protein conserved Manual Annotation 17 Te 2120110 T brea tranaporter putate Aenotanen 18 TRY plk 11 Muockondna half ABC transporter predicted by automabe BLAST arctance Aenotanon 13 YT a liebe 9 T brucei Teclil4 2 11 by BLAST annotation Aznolason gt TRIP w l amp Jallale 2 T ren proses p
22. BLAST Basic Local Alignment Search Tool For stand alondilii quence comparison software Cn3D Stand alone softWire for viewing structures in three dimensions Data Reposito Download 5 of contributed molecular biology data GenBank Download the full database or daily updates Note there sites for GenBank files at the San Diego SuperComputer Cantar Inanbhanle nell mi lm de ame Qe 0 19 Other Places EJ ftp nchi nih gov E My Documents Address ftp ftp ncbi nih gov blast mJ My Network Places This page may appear slightly different if you are using Netscape 160 Workshop on Bacterial Genomics E ftp ftp ncbi nih gov blast executables Microsoft Internet Explorer File Edit View Favorites Tools d yo Search 7 Folders Address ftp ftp ncbi nih noise epe Other Places blast E My Documents My Network Places Details 5 5 release snapshot special Sex Ld II Eco gt E ftp ftp ncbi nih gov blast executables release 2 7 6 Microsoft Internet Explorer File Edit View Favorites Tools Help search 2 Folders Address ftp ftp nchi nih gov iast executables release 2 2 5 Other Places F1 release E My Documents My Network Places Details Jost 2 2 2 6
23. Retrieve result for id 52 79051 08 9647270 retrieve At peak times your BLAST searches could take longer than normal Please be patient BLAST results are kept on our servers for three days following query submission Results may be retrieved any number of times during this period After this time queries must be resubmitted if further examination 15 required There were no significant matches to Pfam domains Clicking here will show Summary for T annulata predicted genes coding sequences wutblastn for query TA02485 Full BLAST Scarch Name 02485 Score 2398 PN 4 0 250 1 Full Sequence full BLAST results Name 02450 2255 PIN 5 7 235 N 1 Sequence Name 16160 Score 1548 47e 160 N 1 Eull Sequence Name TA12370 Score 90 PIN 0 054 N 1 Full Sequence Name 05960 Score 83 PN 029 N 1 Segue Summary for T annulata predicted proteins wublastp for query 02485 Full BLAST Search Clicking on the systematic Name 02455 Score 2398 PIN 5 9 251 1 Ful Sequence 5 5 2 Name 02490 Score 2255 PN 83e236 1 Full Sequence identifier systematic id Name 16160 Score 1548 PN 69 461 1 Full Sequence will show the alignment of Name TA12370 Score 90 PIN 4 4 05 2 Ful Sequence thi t x t th Name 05960 Score 83 P N 0 058 N 1 Eul Sequence 15 agains query Summary for P berg
24. martae ors wiew Phis peels eer av te thes dara casar the hasta Pee s osesmpletebo mes ceanplex quer 119 Workshop on Bacterial Genomics Exercise5 Identifying autotransporters encoded in the genomes of Bordetella pertussis B parapertussis and B bronchiseptica B pertussis B parapertussis and B bronchiseptica are closely related Gram negative D proteobacteria They colonize the respiratory tract of mammals causing whooping cough pertussis B parapertussis as well as a chronic respiratory infection in a range of mammals B bronchiseptica This exercise 1 designed to identify autotransporters in the three genomes of the Bordetella spp Autotransporters are members of a large family of exported proteins encoding an integral outer membrane pore which enables the bacteria to cross the outer membrane As such autotransporters are postulated to function in host interaction and virulence some of which have been experimentally confirmed Imagine you came across a recent paper describing the autotransporter complement in B bronchiseptica see table on page 56 Now think of ways you could identify autotransporters in the other Bordetella species You could do this in a variety of ways keyword searches of assigned product names exercise 4 1 using the orthologue links provided on the gene pages exercise 4 2 using BLAST exercise 4 3 using Pfam Reily browsab
25. FBHEs eodop Gov FEATS Gree 71 submLiLubibos chaxspes V ve I mingle zucLsctide mukmtitutiom changes T JHP moion Wii GIIT LC LOD T va filsstbLaaCp TSTSIPIb DIONI gut T Lo EP Berwesn EL RIT bomolpguam codan mingle gucleacide msubsrirurinE Changes t nin sys aymeus between s raiz 227 homolopers codon 710 LE changes T ro FP between s raiz 717 bomelbjgurs cedan TIL single fac aubstibtutibg Changes V ro 11 nin symscayeous berween Arain 927 boecloyuge codan Th mingle nucleotide Changes z to Vi nin syscnymeus between iiaia 227 codon TRI Be be changes T to o Sapte 25 bee feta TPP vd te eg 00040 07 pep ded tab oeg Oe fprcdurteTbypothkertisa l protein conserved fmvmilarsny fasra CPAM Ariye Lairtmania major hypctbetical 125 01 kDa poctmin 1148 ma L s GTi urgasped 14 22 7 1142 Len 20 0045 sa mubjert 79 1162 fmprismati c id TE827 1 440 7 fmirastHaAhsrirzy of PES TPEISLPO and TRIPLE resis T SCS coed calla C 1
26. Search for Full Content Search Database Entry Point gene by ID description 2 3 Include description Iv Add wildcards Searches Analysis omnmBLAST BLAST Motif Search EMOWSE AmiGO List Download Cross Organism Search Page Complex Boolean Query Gene Name Registry Project Information Other useful websites Download data by FTP using Download Datasets topics Subscribe to the pombe mailing list S pombe Project Page Example Genes Shortcuts to frequently requested gene lists Characterised genes Inferred function Conserved hypothetical Sequence orphans 4 2 rad1 4141 SPAC16E8 04c SPAC26H5 02c 5 830 08 SPCCAB3 13 SPCC594 07c SPCC70 10 Browse Catalogues Products Curation SWISS PROT Keywords Pfam Genome Browser Contig Chromosome Maps July 2005 Second East Coast Regional pombe Meeting This meeting will take place from November 11 13 2005 in Miami Beach Florida Mail info USpombe2005 org for general enquiries and see the meeting website for details February 2005 European Fission Yeast Meeting An interim S pombe meeting will take place on 16th 18th March 2006 at the Wellcome Trust Genome Campus in Hinxton Cambridge UK Preliminary details are available at the Wellcome Trust Conference Programme website Previous news items DB GoTo Oxganisms Gene Results List E Go To Shortcuts Results 1 to 6 of 6 resul shown Previous 5 po
27. dependimy em rhe number cf selected Please cles rhe danskasar tioma balas ded coment DATABASE OPTIONS dSatahases selected below Pam rinsed bed BAIS sequences P predeted genes genes M AL enda grnomr hatg reads uM POS pipid 1 opose conne 3 deque ces I8 E poene prpndes s M ie on database congr chr 1 15 Mb predicted gener chr conim VOD dieciadeum piedod peobens rom a reads 290341123 ganerdad by andarnadionad Che DE codmg H V E ayer Froben CezeDE DA N L EMBL daa E marce daa F V AB Sete spaces EMEL dara IN A Leishenarnr data P major chromosome shotguns N eager end gequencer Sanger and Wark eager L shotgun reada N L iganta abed reads I8 F wot congr IN E behai T F bergbr IN F bergin Oeze DE F P chabana F chabaudi Gs DE N Genel P falciparum E iapa selected chroencdoce dogiar I8 protes P F jalinpanren Sanger Reads N T Gere aemulanr peres T Terum GeseDE pr
28. reature dH il Smallest Features In Front Select Visible Ranae Select Visible Features Set Score Cutoffs Entries Select bota M ieu Edit Create Write De select stop codons a Feature Labels 3 Line Per Entry F Forward Frame Ling D IL Il dH Reverse Frame il 1 i 11 l N i MITT if Start Codons 800 1600 0400 3200 4000 4800 5600 6400 Stop Codons n j ui 3 m All Features Un Frame Lines Show Source Features 4 Flip Displau Colourise Bases 3 Exercise 1 Introduction amp Aims In this first exercise we are going to explore the basic features of ACT Using the ACT session you have just opened we firstly are going to zoom outwards until we can see the entire S typhi genome compared against the entire E coli K12 genome As for the Artemis exercises we should turn off the stop codons to clear the view and speed up the process of zooming out The only difference between ACT and Artemis when applying changes to the sequence views is that in ACT you must click the right mouse button over the specific sequence that you wish to change as shown above Now turn the stop codons off in the other sequence too Your ACT window should look something like the one
29. Feature Start do the same thing Well they do if you have a feature selected but Goto Start of Selection will also work for a regio which you have highlighted by click dragging in the main window So yes give it a try Workshop on Bacterial Genomics Module 1 Artemis 4 2 Navigator The Navigator panel 15 fairly intuitive so open it up and give it a try Artemis Entry Edit S typhi dna c Click Goto Edit Create Write Run Graph Display Noddy then Navigator 0003 1 3 1 18 colour 7 ec ortholoque K Navigator Entry F S_typhi dna start of Selection Ctrl Left ML n of Selection ctri right IE EE MEE THEE HE EU EHE Feature End 004 Artemis Navigator m gt Base 600 Goto Base Goto Feature With Gene Name Goto Feature With This Qualifier Value Check that the search button is on Goto Feature With This Key wv Find Base Pattern Find Amino Acid String V CR S TGTTTIGCCGATCIGT 0 2760 ACAAACGGCTAGACAZ Start search at selection N A S RN Start search at beginning or end HK G I Search Backward Ignore Case Allow Partial Match 114011045 all Goto Clear Close CDS CDS Orthologue of E coli thr AK1H ECOLI Fasta h
30. There are many examples where these anomalous regions of DNA within genome have been shown to carry laterally acquired DNA In this part of the exercise we are going to look at several of these regions in more detail Starting with the whole genome view note down the approximate positions and characteristics of the three regions shown above Remember the locations of the peaks are given in the graph window if you click with the left mouse button within it U U U Region 1 2 860 000 bps peak karlin troughs for G C and CG deviation Region 2 Region 3 We will now zoom back into the genome to look in more detail at the first of these three peaks Zoom into this position by first clicking on the DNA line at approximately the correct location If you then use the vertical side slider to zoom back in Artemis will go to the location you selected Remember that in order to see the CDS features lying within this region you will need to turn the annotation 5 typhi tab entry back on 16 Workshop on Bacterial Genomics Module 1 Artemis The region you should be looking at is shown below and 15 a classical example of what 1s referred to as a Salmonella pathogenicity 1sland SPI The definitions of what actually constitutes a pathogenicity island are quite diverse However below is a list of characteristics which are commonly seen within these regions as described by Hacker et al 1997 1 Often inserted alongside
31. nghi Parmie Vectors mornin scarch Full text search sitewide Wildcards may be used bo cxie the march Qectenon marks may be used to prop warda im a search imio a phrase eg ribosomal protein Use the operador po renim pager contaaminmg feo ueparaice wond phrases DNA polymerase Us the i w io reium Ronin one word phras bui not ancor polymerase RNA sal Cp Arun Spaech Browsing Browse throu ph the descriptions products for thes organises Bowe Fungi fumigatus Sem amp paumie dandam D LL majer berghei nho D T annnka bree f cruci B bronchisptios B parapertuxis B permis pph Woctors 6G morxikans Browse Hy SWISS PROT Keywords Brews though the SWISS PROT keywords axsigmed to these organism Brae Fungi s pomir Proberen D L mujer Morn Assignments Brerecec through the Pian asa prac Pew these Fungi amp cemevisiae amp pombe Praloruai D discosdemn LL flopper brei cruci vivax Bacteria beonchiugstig panapertuisi pertussis amp nghi Pura Vectors t m mia Intcr Pro Assignments Search for name id Browse through the InterPro assignments for
32. 1_ 027487 CATA1_CAEEL P48350 1 CUCPE CATA1 P55307 1 HORVU P30264 1 LYCES CATA1 MAIZE P49315 1 NICPL P29611 1 ORYSA CATA1 RICCO 49284 1 SOLTU 29756 1 SOYBN CATAL TOBAC 043206 1 WHEAT 25819 CATA2 ARATH 2 CAEEL P48351 2 CUCPE P30567 CATA2 GOSHI CATA2 HORVU 9XHH3 CATA2 LYCES P12365 CATA2 MAIZE CATA2_NICPL P55309 CATA2 ORYSA 2_ 1 CATA2 SOLTU P55313 CATA2_WHEAT _ _ 2 OF EJ Done Links to every SwissProt record for this enzyme 65 Workshop on Bacterial Genomics Module 5 Genome Resources KEGG view of EC 1 11 1 6 ENTRY Ei dq catalase equilase caperase optidase catalase peroxidase CLASS Oxidoreductases Acting on peroxide as acceptor Peroxidase hydrogen peroxide hydrodgen peroxide oxidoreductase REACTION 2 2 Heo SUBSTRATE Hazoz PRODUCT H20 Dg COF ACTOR Heme Hanganese PATHWAY PATH MAPOOSSO Tryptophan metabolism The KEGG database PATH Methane metabolism GENES HSA S47 CAT contains tools for MMU i125529 cCas1 ENO 24246 Cat analysing the enzymes CG6B871 Cat 0659314 in pathways Chin voted The use of the pathway maps at KEGG will be explored more fully
33. View Selected Matches Flip Subject Sequence a LA Flip Query Sequence Lock Sequences Unlock Sequences Select either Set Set Score Cutoffs Score Cutoffs or Set Percent Cutoffs Set Percent ID LOCKED Offer To RevComp 1 iff Cutofts Minimum Cutoff 1556 524900 1049800 15747 t WT ee 722 gt Rae HN de m _ 5 AW qT 3 Move the sliders to manipulate the comparison view image Workshop on Bacterial Genomics Module 2 Comparative Genomics Li ox File Entries Select View Goto Edit Create Write Run Graph Display Hs p gt gt P 524900 1049800 1574700 2099600 2624500 3149400 3674300 4199200 4724100 m A enn a ee gt 8 SS Cm 7 ek ee if geil mas mx I S wm t I eS ee ndun 2 22 P Em a m a E LL ee en A TL P LOCKED 524900 1049800 1574700 2099600 2624500 3149400 3674300 4199200 4 Things to try out in ACT Load into the top sequence S typhi a tab file called laterally tab You will need to use the File menu and select the correct genome sequence S typhi dna before you can read in an entry If you are zoomed out and looking at the whole of both genomes y
34. 800 1600 2400 3200 4000 4800 5600 6400 7200 WT TTT TMI TT EN E I EHE HELLE ET n1 m He Rs di GE GE GEO aes TCCCATAAAAACTATTAGTTGTAATATTATTATTCCTTTTTTTTCTACTCTTCATAATTATAAATGTGTTTTAAAAAGGAAAAGAAAATTATTACAT 10 20 30 40 50 60 70 0 90 AGGGTATTTTTGATAATCAACATTATAATAATAAGGAAAAAAAAGATGAGAAGTATTAATATTTACACAAAATTTTTCCTTTTCTTTTAATAATGTA B oe oe GH N e N Sk R OR Re ges sk 1 25 Workshop on Bacterial Genomics Module 1 Artemis To compare the three CDS with others currently in the public databases run a fasta searc Left click the CDS click on the Run menu and then Run fasta on selected features When the search 1s finished a banner will appear saying fasta process completed see above The search may take a couple of minutes to run To view the search results click View then Search Results then fasta results The results will appear in
35. Compounds such as methylderivatives of glucose have been shown to selectively inhibit glucose transport by Plasmodium falciparum Hexose Transporter PfHT Using a variety of tools methods some of which you will already have covered in earlier modules you ll identify this gene in Plasmodium falciparum and then go on to identify its putative orthologues in P berghei and P chabaudi This would obviously be of interest to a researcher in the field who wanted to assess how similar the putative homologues were to the gene in Plasmodium falciparum The following are key references that will be provided to you to give you some background information They are only for your reference and for the purpose of this exercise reading the abstract 15 probably sufficient Joet et al Comparative characterisation of hexose transporters of Plasmodium knowlesi Plasmodium yoelli and Toxoplasma gondii highlights functional differences within the apicomplexan family PMID12238947 Biochem J 2002 Dec pp 923 9 Krishna et al Transport processes in P falciparum infected erythrocytes potential as new drug targets PMID 12435441 Int J Parisitol Dec pp 1567 73 Exercise 1 Searching GeneDB using simple keyword searches Welcome to the gt The Wellcome Trust DE GeneDB website lt Sanger Institute Version 2 1 Pathogen Sequencing Unit Database Entry Point Sequence Searches ta Fungi Go To Choose omniBLAST Protozoa 3 Search
36. Edit selected feature from the Edit menu after selecting the appropriate CDS feature Can you identify a few putative genes in P knowlesi contig based on their conserved and syntenic nature with P falciparum chromosome 13 Activate inactivate stop start codons in an entry using the right click button on the mouse This will allow you to see any potential ORFS Any thoughts about the possible biological relevance of the comparison i ox File Entries Select View Goto Edit Create Write Run Graph Display r n jp ems D B 1 Ly 767000 96500 793000 799500 612500 5 83850 6550 P falciparum ll Pfal chr13 embl ol What 1s the gene product ee m esoo 500 900 900 5500 900 78000 500 hag E Chat Ql Phat P knowlesi Pknowlesi_contig embl s4 Workshop on Bacterial Genomics Module 2 Comparative Genomics Exercise 2 Part III Prediction of gene models There are several computer algorithms covered earlier in Module 3 that predict gene models based on training the algorithm with previously known gene sets with previously known experimentally verified exon intron structures in eukaryotes However no single programme can predict the gene structure with 100 accuracy and one needs to curat
37. Jem 2 fae i ee 7 E ae Vegeta 20 4 50 100 120 arr LE d 17 yy 7 3 mra y i ee aa lt a a a ea lt ee at a sy eed Ge ce ae cd ee ee 00 a A A 1 21515 10001 11515 13695 14945 15460 16739 17094 18620 18950 21427 Note the genes in the display may appear coloured due to updates in annotation These plots show hydrophobicity upper and hydrophilicity lower Is residue 169 located within a hydrophilic or hydrophonic region of the protein note you can click within this diagram to get a line from the x axis up to the curve 4 Graphs CDS Ox 5 m Feature Algorithms kyte Dool tle Hydrophobi city I Wopp Woods Hydrophilicity yte Doolitt ydrophobicity Window size 7 149 Workshop on Bacterial Genomics 58 Now we are going to use the Artemis applet to find a specific amino acid Glutamine at position 169 within the selected protein File Select View Goto Graph Display Java Applet window 3 selected bases on reverse strand 10505 10507 complement 11009 11011 codon 169 in feature CDS Entry W WEN NU E EE NE MN P ME wg LUN 1111081 dE 0108 AU AU OHNE NE
38. The sequence you are going to look at is a small region of contrived sequence 21 kb taken from Plasmodium falciparum chromosome 13 You will see 7 CDSs some with multiple exons As a gentle introduction to splicing we would like you to look at the genes named PF13 0119 MALI3PI1 294 and PF13 0061 They have only been partially characterised and may in fact be missing exons Have a look at these CDSs and confirm edit or dismiss the proposed gene models by using G C content database searches and looking for splice sites Appendix IX G C content is a very good indicator of coding capacity in Malaria On average the coding regions are 23 G C and the non coding regions are 19 Have a look at the G C content for this region by selecting the appropriate graph Left click within the graph window and then select by clicking on the exons to see how this relates to the G C peaks on the graph Note we will cover the principals and methods of gene prediction in much more detail in a module 3 3 File Entries Select View Goto Edit Create Write Run Graph Display Entry Malaria embl GC Content Window size 120 amp 34 16 A um fasta banner 3 33 1 B E MEAM FY 0x MAL13P1 113 fasta process completed LLLI MI TE ME EE EE IU TU CU MD PF13 0119
39. Aims The aim of this module is for you to familiarise yourself with GeneDB and the various ways of accessing querying browsing and retrieving data You ll use GeneDB as a research tool to retrieve candidate genes which you could follow up with further experimental validation In the process you will also hopefully see how GeneDB integrates diverse biological datasets organises indexes and extensively cross references these In addition the exercises are designed to make more general points which need to be taken into consideration when approaching and evaluating database searches not just GeneDB These are 1 How complete incomplete 1s the dataset you are searching In the case of organisms with two sets of each chromosome i e diploid organisms does the dataset represent the haploid or diploid genome content 2 How was the dataset generated a Is it an EST project What estimated coverage does the dataset represent 1 e 1s it a partial 3 5x coverage or a 8 10x coverage project b Has the sequence been manually finished 1 sequencing gaps closed and base checked c How were the gene predictions carried out automated vs manual d How were the gene prediction annotated automated vs manual 3 Depending on the gene prediction and associated annotation method you may need to approach querying from several angles not just 1 methodology e g combine keyword searching with similarity searching 4 When designing y
40. E lt I second view Download 428 gt Genes for pberghei pchabaudi malaria intersect Proteins predicted to have between 8 and 14 transmembrane domains intersect 5 10 42 di md id Proteins predicted to have GO process transport PM M Proteins with a product containing the keyword or phrase transporter intersect Proteins containing a predicted signal peptide E lt l scond view Download 303 Genes for pberghei pchabaudi malaria intersect Proteins with a product containing the keyword or phrase transporter intersect 5 12 06 3 Proteins containing a predicted signal peptide PM lt I second view Download Follow one of the links in the table above to view a query result set or result sets and use one of the following buttons to combine them Union will create a result set that contains all the genes either of the selected sets Intersect will create a result set that contains only the genes i Subtract will remove any genes in the secoad Ge appearing lower in the list set from the fins UNION SUBTRACT be selected query results a new entry will appear at the end of the list 24 We re now going to look at how to download a results set and see what formats and different parts of the dataset can be obtained by choosing different options H Query istory Pathogen Sequencing Unit GoTo Geneds Search List Download This page di
41. HT I pata oilers CDS Sequence DNA and V secutis m i lee und nva Cantig Laculion AD Lengi 1427 bp Evam 4 RO Spliced lengde 1427 Graphical Display Ancmis Cameri 140 Workshop on Bacterial Genomics 33 Gene TA02485 m oo DB Sequence 2 DB Search GoTo organum stoncun 3j S Go p detaul page TA02485 LU nsplicesd DA Send mmi BLAST so Gere Ly BLAST BLAST of heweer transporter heen leg ATCAACCTTA AACCATCCAT ACCTTTCATI TMT CCCADCOOCTU AATACT ATLLAATCCT LTAAMCLUCA TITIUTTUGA gulatiwei Tieileria aummalatalchr GCATTECAGC TCTTGCAGCET A ACCAATTTUC TATICTACAT TTTACTCATT TCACCACCAT CCCANTTACC CCATCCTCET CTACJARMCAR ACCTCCTAGCC GUOTTAACAG CTACATCOCA TACATTTCTA ACCACTTET CTGCATTCAT Ts TCATCCAAAL CACTATTAAT AGCTTTOUCA JTACCTCTAA CCATCCTTAC
42. Institut National De La Recherche Agronomique A search tool which brings together many of the commonly used signature databases for sequence searching A packaged version of UNIX for the PC messenger RNA processed RNA molecule to be translated to form protein National Centre for Biotechnology Information Part of the U S National Library of Medicine NLM National Institutes of Health NIH Proteins Finger print database a compendium of protein finger prints Protein family a searchable database of protein domains a comprehensive set of protein domain families Database of protein families and domains Pathogen Sequencing Unit A searchable database of RNA families Swiss Institute of Bioinformatics SIB A program to predict the presence and location of signal peptide cleavage sites in amino acid sequences Simple Modular Architecture Tool is a curated protein sequence database The Institute of Genome Research The Institute of Genome Research protein family database Program for prediction of transmembrane helices in proteins Computer annotated supplement of SWISS PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS PROT A computer operating system E Workshop on Bacterial Genomics ii Index Module 1 Artemis Exercise 1 Exercise 2 Module 2 Comparative Genomics Exercise 1 Exercise 2 Exercise 3 Exercise 4 Module 3 Generating ACT compa
43. P rpley Jaarzrs hid ssim art r 1909 Teles Cyene demi ie cei t rn WO eae RUE ITE HN I in 000 D d HE T ub wg i i SPAC6G9 03c SPAC6G9 04 SPAC6G9 05 pop gt 4 lt ubp rpl24 senl sybl 1 Curation ARP2 3 actin organizing complex subunit 4 similar to 5 cerevisiae YKLO13C lesen hex lam jami nm pace Predicted Peptide Propert TT Prims ge sirt Mass 19 6 kDa Amino acids 168 IE NUI Dro Isoelectric point pH5 2 Charge 3 0 PMN es Al llI Signal Peptide Not found d s 0 8 2 T t A rt x 2A zt x gt E x 2 T n a o n i n c An nf Transmembrane Domains 0 found eet GPI Anchor Not found Gene Ontology Annotation Term browse Amigo Qualifier Evidence Other RM to Biological Process M Mn E E actin cortical patch assembly 155 unpublished with SPTR P33204 15 others Ec regulation of actin filament polymerization 155 GOC u
44. PRIMER ENCUENTRO INTERNACIONAL DE JOVENES INVESTIGADORES SOBRE GENOMICA HUMANOS wellcome trust La Ciutat LAS AATE a t Workshop on Bactenal Genomics 28 30 September 2005 Held at Ciutat de les Artsi de les Ciences Valencia Spain Workshop on Bacterial Genomics Timetable Wednesday 28th September 08 30 09 30 Registration 09 30 10 00 Open ceremony Andr sMoya 10 00 10 30 Coffee 10 30 11 00 Introduction J ulian Parkhill 11 00 14 00 Artemis Guided excercises Nicholas Thomson 14 00 15 00 Lunch 15 00 16 00 Gene Prediction Nicholas Thomson 16 00 17 00 ACT Guided exercises Ana Cerdeno 17 00 17 30 Coffee 17 30 19 30 ACT Guided exercises Ana Cerdeno 19 30 20 30 Public conference J ulian Parkhill 21 00 Official dinner L Oceanografic Submarine Restaurant Thursday 29th September 09 00 10 30 ACT Guided exercises cont d Ana Cerdeno 10 30 11 00 Coffee 11 00 12 30 Generating ACT comparison files Ana Cerdeno 12 30 14 00 J emboss Intemet Genome Resources Tim Carver 14 00 15 00 Lunch 15 00 16 30 Intemet Genome Resources Ana Cerdeno 16 30 17 00 Coffee 17 00 19 00 Data mining using GeneDB Christopher Peacock 19 30 20 30 Public conference Jean Mane Clavene Visit to two Scientific Centers in Valencia 19 30 22 30 Visit to CSATand Science Bar 21 00 22 15 Museum
45. Pfam SRS FTP E coli S GeneDB 5 Entrez PubMed a KEGG Table of Contents Generalized KEGG Content Database Search amp Compute Search Paihia Search objects in KEGG pathways pU KEGG PATHWAY Color objects in KEGG pathways PATHWAY KEGG pathways in XML LEE BLAST search against GENES SENOME GENES KEGG GENES FASTA search against GENES GENOME DGENES EGENES KEGG EXPRESSION GENOME COMPOUND Search similar compound structures DRUG Chemical Search similar glycan structures GINLCAN information Predict reactions and assign EC numbers LIGAND Generate possible reaction paths R PAIR ENZYME KEGG Orthology KO KEGG BRITE Automatic annotation KO assignment KO Therapeutic category of drugs Binary relations and hierarchies Specialized KEGG KEGG for specific organisms Enter KEGG organism code Go Help examples hsa mmu sce eco bsu syn Or use the list of KEGG organisms Customize the organism menu with selected organisms Select Show currently selected organisms All organisms in GENES KEGG for selected research areas KEGG ANNOTATION for genome EST annotation by KAAS KEGG EXPRESSION for microarray data analysis KEGG GLYCAN for glycome informatics E23 Workshop on Bacterial Genomics Module 5 Genome Resources Search Objects in KEGG Pathways Mozilla 77777 Elle
46. gg dw pd HM HE HI Fone Sn INE EE EE ST RDOA CORO Te TGTTTGCCGATCTGTTACGGACCCTCTCATGGAAGTTAGGAGTTTAACATGGTGAAAGTGTATGCCCCGGCTTCCAGCGCGAACATGAGCGTCGGT 0 2760 2770 2 2810 2820 780 2790 2800 2830 2840 ACAAACGGCTAGACAATGCCTGGGAGAGTACCTTCAATCCTCAAATTGTACCACTTTCACATACGGGGCCGAAGGTCGCGCTIGTACTCGCAGCC 5110005 5170006 190 Bg Orthologue of E coli thrL LPT ECOLI Fasta hit to LPT ECO 337 2159 Orthologue of E coli thrA AK1H ECOLI Fasta hit to E 343 369 500324 Aspartokinase signature 314 2382 PS01042 Homoserine dehydrog n HE 3730 Orthologue of coli thrB ECOLI Fasta hit to KHSE 734 022 114 _ WE misc feature 2 306 3103 2500627 GHMP kinases putative ATP binding domain CDS Si 5020 Orthologue of E coli thrC THRC ECOLI Fasta hit to THRC misc feature 402 4066 500165 Serine threonine dehydratases pyridoxal phosphate at CDS Gal 5887 c Orthologue of E coli Fasta hit to YAAA CDS 5966 7396 c Similar to Bacillus subtilis amino acid carrier protein alst misc feature 7091 7138 PS800873 Sodium alanine s PA family signature CDS 1665 8618 Fasta hit to TALA ECOLI 13 6 aa 65 identity in 311 aa ove Trannaldalana 4 It may seem that Goto Start of Selection and Goto
47. liene nura um peer 331 PRA PORC Job Sram 2 Bun ferta 1 om gelecoed Features HERE RE B oan fra Features El Aun on amlected Features 1 Inr MP Sun 0 features Bun bimp 11 mm abe aere lt xa on pal cted featured un eblastn se Paura all bn selected Features PO Bah elected Taatiraa did AM Diii ni on actuel eater le Sy ia I d I HEN 111 lun blai yw wi noid i prn 57 ce fun Slaves aeal cs Bm td Panen i NR EH M EE EL bo hn jahren wheta fle taf t wont re Pe faut toties DOCTTAACATCCCATT Set fasta options funn co jaa 2500 rra rte C Det papstata mrima Ter amp lastg option Tat 11323 ILI E der eth septies Ups 11673 oc mart options ti Lae ion 11798 Majority af 2 1
48. unknown function 107 unknown function 1 04 insB possible 181 transposase len 156 aa highly Also do this for R27 embl 51 Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST Running Blast There are several programs in the Blast package that can be used for generating sequence comparison files For a detailed description of the uses and options see the appropriate README file in the Blast software directory see appendix X In order to generate comparison files that can be read into ACT you can use the Blastall program running either BlastN DNA DNA comparison or TBlastX translated DNA translated DNA comparison protocols As an example you will run a BlastN comparison on two relatively small sequences the pHCMI and R27 plasmids from S typhi In principle any DNA sequences in FASTA format can be used although size becomes and issue when dealing with sequences such whole genomes of several Mb see exercise 2 in this module When obtaining nucleotide sequences from databases such as EMBL using a server such as SRS http srs ebi ac uk 1t is possible to specify that the sequences are in FASTA format File Edit View Terminal Help Make sure you are the Module 3 directory You should now see both the new FASTA files for the 1 and R27 sequences in the Module 3 directory as well as their respective EMBL format files Hint You can use the pwd comma
49. 62 0 95 1 roe fia Th05 26K5 680 hypothetical protein unlikely Trypanoso 46 0 992 1 100047 hypothetical protein Trypanosoma hrucei chr 45 0 998 1 NU17 710 isocitrate dehydrogenase NADP mitoch 68 0 9991 1 TUA md Tiira i lIhywothetical protein unlikely Trynanosoma 44 0 9996 1 28616 235 actin like protein Trypanosoma brucei chr 9 Manual 1 Full Sequence CDS Info Location Legh EST bp Score 970 346 5 bits Expect 5 3e 99 P 5 3e 99 Identities 189 410 46 Positives 276 410 67 ISP Sequence Query 31 FPTVIATRSAGASS GPAVSSKPSYM ASKGSGHLSSKRATE DLDFFIGNDALKKA 84 DV 6 GA PHY Shjot 3 YPVWIDNGTGYTKNGYAGNEEPTYIIPTAYADNEASRRRSHDVE SDLDFYVGDEALAHS 62 Query 85 SAGYSLDYPIRHGQIENWDMMERFWQOSLFKYLRCEPEDHYFLLTEPPLNPPENRENTAE 144 oir aa Fc ar pat of comple ARP rd y B Wen of ket S 4L PIA MER WQ PE H P LTEPP NPPENRE HTAE Prelrted Frptide Properties Shjct 63 SS CHLYHPTKHGIVEDWDRMERTHQHCVYKYLRVDPEEHGF ILTEPPANPPENREHTAE 121 HE ET Query 145 IMFESFNCAGLYIAVQAVLALAASWISSKVIDRSL TGTVVDSGDGVTHIIPVARGY 200 L IAVO LAL ASHTS K 1 T6 WVDSGDGVIHI P 464 Sel ice Sbjct 120 VAFETRGVKQLHIAVQGALALRASHTSGKAQQLGLVGENTOVVVDSGDGVTHIVEIVDGE 181 lreazezbirao lazne GFI Aabar Faka Query 2
50. CHAPERONIN60 UniProt P48425 500751 qTCP1 7 PS00995 1_ IPR002194 IPR002194 IPR002423 IPR002423 IPR002423 IPR008950 IPR012714 PF00118 Cpn60_TCP1 PR00304 mmm TCOMPLEXTCP1 PTHR11353 SSF48592 TIGR02339 Cpn60 TCP 1 GroEL ATPase thermosome arch 1 la6dA L9 9 9 9 M 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 1 10 560 10 2 la6dAl 330260102 3 50 7 10 3 1 5500 m m m m 912 Fas ae fas stan cles ent SCOP Domain dla6dal 404 519 8 5 2 dlass d 56 1 2 e dla6da3 IPR001844 PR00298 CHAPERONIN60 IPR002194 PS00750 1_1 IPR002194 PS00751 TCP1 7 IPR002194 500995 23 IPR002423 PF00118 Cpn60_TCP1 IPR002423 PR00304 TCOMPLEXTCP1 IPR002423 PTHR11353 Cpn60 TCP 1 IPR008950 SSF48592 GroEL ATPase Workshop on Bacterial Genomics Module 5 Genome Resources Exercise 2 MPsrch Mozilla File Edit View Go Bookmarks Tools Window Help 2 9 2 Q Q fa http www ebi ac uk MP srch index html Datab uropean Bioinformatics Institute Map Ell ces z About EBI Groups Services SIMILARITY SEARCHING
51. DB CDS 1366 B bronchiseptica D B Systematic Name Gene Synonyms Product Type Sequence Chromosome Contig Location Exons Context BB1366 pm pertactin precursor CDS DNA and Protein 1 1461346 1464096 200 Ee Length 2751 bp 1461346 1464096 Spliced length 2751 bp 1455000 Location Graphical Display in Artemis 1460000 1465000 1470000 147500t Search for Organisms Go To Shortcuts Po E 1357 BB1358 BB1359 BB1360 1361 BB1362 BB1363 1364 1365 gt BB1366 lt 1367 BB1368 BB1369 1370 1371 BB1372 1373 BB1374 BB1375 BB1376 1377 Mass Isoelectric point 94 3 kDa pH 10 0 Primary Annotation Predicted Peptide Properties Amino acids Charge 916 23 0 Signal Peptide Transmembrane Domains Not found 0 found GCCAACACCA AACAAGGACG GGGCAGTGGA GGTCCCCAGC GCGCCGGCGC AACACGGGTG AAGCGCCTGG GCGCAACGCC TTCGAGCTGG CTGGCCGGCT AGCGTGCATG ACGCTGCGCG GTCAAGGGCA GCCCATGCCG GGCGGTGCGT CTGGGTCGCC CAGCCATACA GCCGCCGCGC AAGCTGGCCA send to GeneDB omniBLAST Sendto GeneDB BLAST Send to BLAST at NCBI gt 1366 prn pertactin precursor Bordetella bronchiseptica chr 1 1 MNMSLSRIVK GAGVRTATGT GKLVADHATL IVDGGLHIGT GRAAGVAAMD STVDLAQSIV RWTGATRAVD GLFRMNVF NEDGKVDIGT APAPOPP
52. E f Search S Favorites lt 63 3 Address http biacvc org 1555 server htnnl ao BioCyc Query Page This form provides several different mechanisms for querying PFathwayGenome Databases Select a dataset Query ae Mul Enter product name here and click Submit All objects containing that name as a substring will be returned You may also enti and hit submit names or EC numbers separating them with commas To retrieve objects by name first select the type of object you wish to retrieve then enter the name Choose from a list of pathways Browse Ontology Panwas g Each dataset contains classification hierarchies for pathways for reactions the enzyme nomenclature system for compounds and for genes Select a classification system to browse Links to summary information about the selected organism Summary page for dataset Metabolic Overview Diagram Expression Viewer not available for D Internet 68 Workshop on Bacterial Genomics Module 5 Genome Resources Query Results Microsoft Internet Explorer File Edit Favorites Tools Help i2 yo Search Favorites 3 Address 2 http biacyc org 15B55 ECOLI substring search type MIL amp objecE aldolase Query Results query aldolase matched the following objects Proteins 2 dehydro 3 deoxyphosphogalactonate
53. FTP Site InterProScan Help InterProScan Programmatic Access Database Information gt UniProt gt UniParc EB Home About Groups Services Toolbox Databases Downloads Submissions PROTEIN FUNCTIONAL ANALYSIS Sequence Search This form allows you to query your sequence against InterPro For more detailed information see the documentation for the perl stand alone InterProScan package Readme file or FAQs or the InterPro user manual or help pages Download Software YOUR EMAIL RESULTS interactive APPLICATIONS Clearall Check all BlastProDom FPrintScan HMMPIR HMMPfam HMMSmart HMMrTigr ProfileScan ScanRegExp SuperFamily SignalPHMM TMHMM HMMPanther TRANSLATION TABLE DNA RNA only MIN OPEN READING FRAME SIZE None 100 v Enter or Paste a PROTEIN Sequence in any format MAAKDVKFGNDARVKMLRGVNVLADAVKVTLGPKGRNVVI ELEDKFENMGAQMVKEVASKANDAAGDGTTTATVLAQAII DKAVTAAVEELKALSVPCSDSKAIAQVGTISANSDETVGKL GLQDELDVVEGMQFDRGYLSPYFINKPETGAVELESPFILI RVASKLADLRGQNEDQNVGIKVALRAMEAPLRQIVLNCGEE TEEYGNMIDMGILDPTKVTRSALQYAASVAGLMITTECMV MGGMGGMM Upload a file Browse Submit Job Reset PLEASE NOTE Interactive job results are stored for 24 hours email job results are stored for one week InterProScan Mozilla Eile Edit View Go Bookmarks Tools Window Help v WwW Ej TET 5
54. Hopefully you will now have an Artemis window like this If not ask a demonstrator for assistance C Artemis Entry Edit _typhi dna E 4 _____ _ e E E SS RoR A CTTTTATTTATA GT L4 db fap mj IC WU RCUNMCECGCH UE a an 8 Ed St Jn p gm dS c gu gu jy jy e o qo s ub do Now follow the numbers to load up the annotation file for the Salmonella typhi chromosome Ei Artemis Entry Edit S typhi dna What s an Entry It s a file of DNA and or amino acid features which can be overlaid onto the sequence information displayed in the main Artemis view panel Workshop on Bacterial Genomics Module 1 Artemis 3 The basics of Artemis Now you have an Artemis window open let s look at what s in there Artemis Entry Edit S typhi dna File Entries Select View Goto Edit Create Write Run Graph Display Noddy Selected feature bases 930 amino acids 309 3STY0003 1 3 1 18 colour 7 ec ortholoque E Entry 68 typhi dna 5 typhi tab LIE EE 1 Y ET INI OM TI I Wu
55. aspects of a given protein s function in terms of its molecular function biological process and cellular component location Where evidence exists from the literature from sequence analysis or other sources gene ontology terms for function process and component are attributed to that gene AmiGO is the database housing assigned gene ontology associations and 15 maintained by the Gene Ontology consortium It allows searching and browsing of gene ontology annotations across many genomes from human mouse through to lower eukaryotes including those which are not annotated and curated for GeneDB GeneDB has a copy of the GO database and an installation of the AmiGO browser on top of it Advantages of a local copy of the GO database include an increased update frequency as well as the inclusion of datasets not otherwise searchable via the official GO database e g assignments inferred by eletronic annotation It can be a powerful way to search for genes with similar function across several organisms The example below shows how to set up this query which can be either accessed from the organism home page and or the search menu bar at the top of each of the feature pages One you ve tried it and have become familiar with it try some of the other suggested searches or perhaps one that would be of interest to your own research 95 Workshop on Bacterial Genomics DB CDS Tb927 2 2900 WARNING Sth September 2005 The Ful
56. aubcblranspnrklrr 2557375 2590142 furward HW phy BEP1390 autotransporter 2120702 2140030 forward HW 4 DPPOTIS autatrangporter 7UUISU 70002 reverse Loo BPP IA9 sutokraneporker 472147 477197 Forward HW 159Bn 7 2 autotransporter 478053 4833652 forward HW l77085 2 2 autotransporter B5353031 087405 reverse 129606 Chal Lilamenlous hemagglutLinin adhekim 32268239 3 xphH3 HPP 2053 arrine poobease 7173178 717640H forward 3791 Proline rich inner membrane protein 4104773 41058 putative membrane protein 7530013 253J7491 reveree Thal BPP adhesin FORRES ae Ehas BPP1243 adhesin 1331858 113339673 forward 268245 BPROTI2 hypothetical protein 775058 777270 forward 40073 113 0 00017 conserved hypothetical protein l676704 1677114 fo LosB BPP2535 siderophore mediated iron transport protein BPPUB49 cell division protein 918936 920003 forward flak flaR BPP1505 flagellar hook Length control p naX dna dnazX HPP1721 polymerase 111 subunit Tau EDP17B3 hypothetical protein 4096937 4098510 forward M B RPP ewenrted mrntein 4457 5 forward 126 147 Bn E 104 wih 1 6w 103 33 5 87 339 9 8c 206 6 0 20 330 1 27 223 1 1e 20 119 2 1e 08 121 3 2e UH 171 T 1c nT7T i96 6 7 0 11 12 06 111 H Je amp 110
57. c Sanger Institute Pathogen Sequencing Unit GoTo Search Simale This page dlowe you to bald mine complex quemes agnus the database AND bation the menis bo select tra diferent quer query will rebum onby those objects thei satisfy query 1 AND query 2 buie ng a peret of query fons and the boolean CR For example to a qaeny began ey on the Finally click on to rot to generate page that will aloe you to specify parameters forthe bio quesies Running the booleas you mari select query Prom serh menu before the proceed button wil work IP you need to beck up a step use the browsers back Xon are currently searching T brucei Choose a different containing a specific Pfam domain ABC ABC mransparter ransmembeana regian doen HS GMAT tensity ACHE Acanitase Cteminel dome Acanitase fhaeniy sconiteta a B Query options Rows per page x query Query Genes for inbernect Process that contain Pies doman AEC traeaporter 1 20 1 20 name Id organism description 1 14420 trataporter Aenotanen 2 T5327 2 5410 T bruesi protem patatree
58. iiu Edit Selected Features Edit Subsequence and Features Edit Header or Entry Ctrl E POOP DOD MPH Dv b STY4481 lt dmsB 509 STY STY4535 PO OP PU P EP nelb dest 5174518 5114554 1 pz 8774564 4574 8774588 4599 Herr Duplicate Selected Features Ctrl D 5 se Sras 714501 5114621 cI 4645 5114663 Merge Selected Features Ctrl M T FIDE UG CONG 81145111524 JL ST 8114365 STE 2714750 Lue Unmerge Selected Feature misc feature ature misc f misc featu ature unit ature RIS misc feature featui misc fei Delete Selected Fe ur 4 43731 2 60 399200 4413300 E7400 4441500 4455600 4469700 4483800 4497900 4512000 4526100 Delete 5 misc feature fe misc feature veat A L a 4 444 ao oO Features E STZA 5774492 S STZ pilN pilV2 5114585 51 5 4604 4618 5114641 C Eie d a 4 ime 8 4 8714622 Y e insB iB 44 4H 4 5 1614 3 TY4633 17 exB tviA Ctrl Delete misc_feature isc_featu misc ure peat_unit _unit lt S Bete 8114580 q phoN STY4582 31745 didi e Next Met E _ Trin Selected Features To Next Any 73170 4473150 14475150 4473200 4415210 4473220 473230 4473240 412250 Extend to Previous Stop
59. pile PU ocd Ah eae Ha Y Podium 761000 773500 T80000 786500 79300 799500 80600 812500 819000 825500 832000 838500 858000 lm A 1 iI TES 7 LL 7 f if 4 5 chr 13 fragment M TIT T uii AMA m P knowlesi 6500 13000 19500 26000 32500 39000 45500 52000 58500 65000 71500 78000 84500 Comparison of P knowlesi contig and the annotated chromosome 13 fragment of P falciparum 36 Workshop on Bacterial Genomics Module 2 Comparative Genomics Exercise 2 Part II Conservation of gene order synteny In the ACT start up window load up the files Pfal chr13 embl Pknowlesi contig seq and the comparison file Plasmodium comp crunch Use the slider on either sequence view panel to obtain a global view of the genome comparison Also used the slider on the comparison view panel to remove the shorter similarity hits What effects does this have Can you see conserved gene order between the 2 species Can you see any region where similarity is broken up Zoom in and look at some of the genes encoded within this unique region in file Pfal chr13 embl top sequence Example location Pfal chr13 embl 815823 829969 What are the predicted products of the genes assigned to this unique location View the details by clicking on the feature and then select
60. 0 00017 lb 2 16 5 4 1 76 2 1 8 107 0 00022 102 0 00047 103 0 00051 0 00005 105 m mna 12 n 0072 f BS B3 qd AP ORE Ku 3 Bd d c d 3 BA du Bp gt The Wellcome Trust lt Sanger Institute J gt Pathogen Sequencing Lait Help Workshop on Bacterial Genomics TIES mrquemcrz 1 975 478 eal Iriksrm Searching 218 OUS dne gt prn BPP1150 pertactin precursor 1229069 1231837 forward 95179 Full Sequence 2 Sequences predating H igh ceringa Prgeesi Faire oe Wrri110 pnmum forward 84 D i Length 922 Miete STRE KW OU ME ZU 3 1 21812 1 2 171 forward 5465 dite 1 2 i RHEINE 5 4642 1639 1 bits Expect 0 0 2 BrrISyT ait depotet 1 1 37 Identities 909 925 989 Positives 910 925 98 EERIilTWatatramsparirr 1547193 7390147 farward 811 3 325 2 phy traneperionc frrward 1 HSP Sequence BEROTA 7801281 783482 pererae BLOTTER h ii 2 EPO mutate EFFET ATL forward 124 1 9 EFTOAS anbabramspn 478033 603152 Turward 177005 zt M Query 1 MHMSLSRIVKAAPLRRTTLAMALGALGALGAA
61. 11 1 6 nhibitors Functional Parameters paca M Value mM i Value mM Tryptophan metabolism 00380 gt 5 LJ o in urnover Number pecific Activity RECOMMENDED GeneOntology No 4096 H Optimum catalase H Range emperature Optimum SYSTEMATIC NAME emperature Range hydrogen peroxide hydrogen peroxide oxidoreductase Organism related Information ource Tissue SYNONYMS ORGANISM COMMENTARY LITERATURE ocalization caperase P CAT AA Sequence CatA Aspergillus nidulans catalase peroxidase CatB Aspergillus nidulans CatF Pseudomonas syringae equilase rystallization HPFA Escherichia coli catalase peroxidase isoenzyme Escherichia coli catalase peroxidase isoenzyme HPII Escherichia coli HPII Escherichia coli monofunctional catalase H Stability emperature Stability Organic Solvent Stability Pseudomonas aeruginosa D erAmEAoIz v z u m r Tu 2 22 255 a Uum zh n En 2 c E op 5 o m s a l x ES a 2 94 E 64 Workshop on Bacterial Genomics Module 5 Genome Resources EXPASY Database view E 77772 A File Edit View Go Bookmarks Tools Window Help Q Q Q http www expasy org cgi bin nicezyme pl 1 11 1 6 a 4 Home Bookmarks S Google S Lib
62. 1s made publicly available by the Malaria Genome Sequencing Consortium Several animal models of malaria have also been used by researchers to study several aspects of malaria biology host parasite interactions Sequences representing partial genomes of some of these model malaria parasites are also available now This allows us to perform comparative analysis of the genomes of malaria parasites and understand the basic biology of their parasitism based on the similarities dissimilarities between the parasites at DNA predicted protein level Aim You will be looking at the comparison between a genomic fragment of the primate malaria P knowlesi and the previously annotated chromosome 13 of P falciparum By comparing the two genomic fragments you will be able to study the degree of conservation of gene order and identify new genes in P knowlesi genome As part of the exercise you will also identify any gross dissimilarity visible between the the two genomic fragments and finally predict modify the gene model for one multi exon gene in P knowlesi genomic fragment The files that you are going to need are Pfal chr13 embl annotation file with sequence Pknowlesi contig seq sequence file without annotation Pknowlesi contig embl annotation file with sequence Plasmodium comp crunch tblastx comparison file i File Entries Select View Goto Edit Create Write Run Graph Display erri YU NUUS MIS th eT TT ihe TT
63. 4 2 RS ee YE BK ECNOISYIOBDIBSCAE BROKO CIJTONUBOISEO ROIOR SUICESRSICER Ra CE KK M source 1 21515 cos 1 1028 cos 2465 3682 cos 4594 8139 cos 10001 11515 cos 13695 14945 cos 15460 16739 cos 17094 18620 cos 18950 21427 4 Note the genes may be coloured due to updates in annotation lacatian complement 10001 11518 Complement Crab Range Remove Range Coto Feature Select Feature PCearpertecomponent termephatoreceptor cuter tapeni membrane evidencesItA db 12 pozari JCtus hat NI dates tion 04 115145 termemonosaccharida tramrportimg AlPase activity DTP atem 2002027 lpr vibus systematic id PFUZ harde Fransporter lPoystematic ie dic Workshop on Bacterial Genomics CDS4 amp linkType null amp note null amp locecomp Nothing selected Entry eT Hn 1600 2400 3200 ooo laoo 5600 400 7200 111 8 nag ing UII LN ILI o HEBR LL cos i ee EY 2097 i ee ee ee 0 ee he ae Ree See i ee Poet ee ep ee ied ie Ge a Ms ae
64. 4 cytidine 5 diphospho 2 C methyl D erythritol from Z C methyl D erythritol 4 phosphate by 2 C methyl D erythntol 4 phosphate cytidylyltransterase a new enzyme in the nonmevalonate pathway Tefrakedron 2000 703 706 66 Workshop on Bacterial Genomics 2 BioCyc Home Mozilla i Elle Edit View Go Bookmarks Tools Window Help Module 5 Genome Resources Exercise 2 4 Home Bookmarks lt Google Library 5 SSC 5 SSC dev S WIKI S PSORT Pfam S SRS7 FTP E coli DB S GeneDB lt Entrez PubMed BioCyc Home Search Database Search Advanced Database Search Help News BioCyc grows to 160 og May 23 BioCyc 9 1 Services Software Data Download including BioPAX format SBML format User Support Subscribe to Mailing List EcoCyc T shirts Information Introduction to BioCyc 162 Pathway Genome DBs Guided Tour Pathway Tools Software Publications Linking to BioCyc Search text of BioCyc web e Privacy Policy BioCyc Home Page BioCyc is a collection of Pathway Genome Databases Each database in the BioCyc collection describes the genome and metabolic pathways of a single organism with the exception of the MetaCyc database which is a reference source on metabolic pa om many organisms To learn more about BioCyc read the Introduction page The BioCyc databases are divided into three tiers bases on their quality BioCyc Databases Tier 1 Intensively Curated Data
65. 4 2 Workshop on Bacterial Genomics Module 5 Genome Resources Summary Table 5 Mozilla Eile Edit View Go Bookmarks Tools Window Help ES 2 Nucleotide sequences gt Go Site search FH Site Map Database Queries uropean Bioinformatics Institute Home About EBI Toolbox HOMOLOGY amp SIMILARITY Help Summary Table General Help Formats SUBMISSION PARAMETERS Gaps Title Sequence Database uniprot Matrix Sequence length 548 Sequence type p References MPsrch Help Program MPsrch pp Version 4 2 80 Matrix PAM 100 Open gap penalty 14 Gap extension penalty 14 Database Information UniProt Show Annotation MPsrch Result XML SUBMIT ANOTHER JOB gt UniParc Show Alignments Clear all Check all Invert selection Reset Alignment DB ID escription Length Match 5 1 UNIPROT Q548M0_ECOLI GroEL 548 1100 0 100 0 0 00 00 217 0548 1_ GroEL 548 996 99 6 0 00 00 UNIPROT Q6UDBI ECOLI GroEL Fragment 548 99 5 99 5 0 00 00 4 UNIPROT Q6Q099 ECOLI GroEL 548 99 5 99 5 0 00 00 5 7 UNIPROT Q6UDB3 ECOLI GroEL Fragment 548 99 5 99 4 4476 0 00e 00 6 UNIPROT Q6UDB2 ECOLI GroEL Fragment 548 99 5 99 4 4476 0 00 00 71 UNIPROT CH60_ECOLI 60 kDa chaperonin Protein Cpn60 groEL protein 547 99 6 9
66. A IE are misz faex TIM MM do IIT M 2 Tb327 1 880 In mnn po ma ai In M V 1L 11811 j 4588 1 1 Yey E P LR S H UV LM L FF Sb R VI LY HM Y L L G G K T Y L R ICOGCATICATIAATGCTCTTCTICTOGITAAGGGTCATCTTGTATCACAATTACTTATTAGGTGGARARA ARA OSTA r a 11200 11280 11100 11315 11720 11310 11340 Lis 113650 11378 Majority of WCS TREBISAZO TPIDIH and reads call C One WCS read variation 11791 Majority of 5 1 and reads call One HCS variation 2 11802 Majority ef TD2 A12 and TPBl5A O reads call WOR variation 11009 arity of WCB TPTES TP20AI12 and reads call variation 11826 Majority of TRIES 2 2 and reste cal One WOS 13049 misc feature 5 splice acceptor site ne Poko tie gt OTR misc feature Bplice acceptor site 11 gene feature aplice acceptor site 1 gene PGEA n oan Fita Entries Satect View Edit Crase Urita Bury cr mm
67. BB0419 Novel BPP0449 BB0450 Novel BP0529 BPP0452 BB0452 Novel 735 BB0821 Novel 822 BB0916 Serum resistance protein BP3494 BPP0867 BB0961 Pertactin BP1054 BPP1150 BB1366 BapA AidB BP2224 BPP2251 BB1649 Vag8 BP2315 BPP2415 BB1864 BapC BP2738 BPP2591 BB2033 Phg BP1767 BPP1998 BB2246 Novel BP1793 BPP2022 BB2270 SphB3 serine protease BP1110 BPP2053 BB2301 Novel BP2627 BPP1256 BB2324 SphB2 BP1660 BPP2745 BB2741 Novel BP1344 BPP2678 BB2830 Novel BP1610 BPP2975 BB2941 Novel BPP1618 BB3110 Novel 1617 BB3111 TcfA BP1201 BB3291 BapB BP1200 BPP1815 BB3292 Pseudogenes are underlined DB Bordetella bronchiseptica GeneDB DB This page provides access to the annotation and sequence of bronchiseptica strain RB50 This is the result of a collaboration of the Sanger Institute with Duncan Maskell and Andrew Preston of the Centre for Veterinary Science Dept of Clinical Veterinary medicine The University of Cambridge This data described in Parkhill e al 2003 Comparative analysis of the genome sequences of Bordetella pertussis Bordetella parapertussis and Bordetella bronchiseptica Nature Genetics 10 1038 Ng1227 PDF version of this article Database Entry Point Search for gene by ID description 091 6 B bronchiseptica Iv Include description DB CDS 091 6 M Add wildcards Searchfor Organisms GoTe H Contact Add Full Content Search General
68. CTICAAATCTC ACARALATTT TCCTTAAACT AATCCCTCCA AATTACAMCT CAMCLTTAGC TICATCTICT ATTCTTTTEC TTAACTTTCT TCCCACTATC ATTCICALET TARATTTGGE TTCTGCTTTE GOGTATADGU TTTTCTACAT TITTTATGGU CCTACCATCT CAATTOOLAL TEGCTUCCTE TIETATCTTU AATAGLTTUL TGCATTCOT TTATCTCOCT TTATTTATUE CADCTATTTG CAACTCAATA CAGCCATOLA CCTTTAMGTE CATTAATTOE CIATCTGCIT CACTUMCTTT GATTO A CAATTTTTCA TTTUECTACTU TICECTTTCA AAACARACAA AGDCTCTTOE CATCGC AAA CCATATCACT Protein Send io GeneDE canmiBL AST GeneDB BLAST Send ie BLAST a FTO heweer tranaperter heeelogue putativeiThbeileria ammulatalchr unknewm Manual MEVEASTFLI ACASICALCA LCFCLTXAAL STOP MENCKCEITYI AAPCARILICL MVIMaTIVUC SVL BMLFICHLIS GFEGICLAAN FVFLVEICHE EEXYFANIY QLFFITFGLLI SACHOLAHCR FELTED WESTOFLPUY COURALILTIW PVVFIDTTYET LISTE RIVITKLACE DEDOEVARST PSIFLLIALC WFEYERVITH APVLAAVOOL VCVTILTSwV TLIFLEVWER WTWSTLASSS TILLVEFWATI ILTFIICKFG KETFIVWECIC FTFFMALFS FSKFICKLAS PCFIPGTAPG LOCIMNVILS EWTATETHDG INN LOASLTLIAS EFLISTSEVL VETILFAFSL FCFTYVIVET EETECVATOR Ayn Lio to main detai pape for TASER Hosied
69. Content Search i sil porting at tbe previous data rar s data This wil be fined later the day He Joon E HTIGR I PUT NEED Je Kem Re GO EERI General Information Nane Te321 2 200 Gene Synonyms 1078 250 Status role inferred from homology Products complex subst putative 5 others CDS Sequence DHA and Proten Location Chromesotse Chromesome Location 35410 amp 554648 Length 543 bp Graphical Display Artemis Genome Browser Context 1b327 22830 22980 22950 T0227 2 2210 22220 27 2 2940 Thee 2950 2 2370 Primary Annotation ARPIN complex 4 porated by B Wiekstead of Oxford Predicted Peptide Properties Mass 207 kDa Amme acis 18 Iseeleeni point pH75 Charge 3n Signal Peptide Not found Transmembrane Domains found GPL Anchor Hot found Protein 1 Preeein E Doman Information DE Description Note FEOS complex 20 kDa ARPO4 Residue 1 177 Score 1 86 80 ARE23 20 kDa from Tet Gene ntelogy Annetation Bislagical Process acts ament polymerization 155 TIGE Tha T5327 2 2900 TIGR EEF GO ref wih 59 155 1 other pobhazenzaton EA GOCmterprodgo Lather 155 TIGR Teal 1997 2 2900fTIGE
70. Datab uropean Bioinformatics Institute d Ellos EBI Home Help Formats Gaps Matrix References General Help InterProScan Help 7 O4 23 525 0 0 PF00118 60 TCP1 L 124 Nucleotide sequences 2 co M Site search co About EBI Groups Results Table View Raw Output XML Output Original Sequences SUBMIT ANOTHER JOB SEQUENCE Sequence 1 CRC64 91753E248DECASB 1 LENGTH 548 amp A InterPro Chaperonin Cpn60 IPR001844 PR00298 Family i PS00296 InterPro IPR002423 Family Services Toolbex Downloads TERPRO DATABASE Submissions 60 CHAPERONINS CPN60 Chaperonin Cpn60 TCP 1 PR00304 lt TCOMPLEXTCP1 PTHR11353 Cpn60 TCP 1 InterPro IPR008950 Domain InterPro chaperonin GroEL IPR012723 IIGR02348 inem GroEL like chaperone ATPase SSF48592 Sma C roE AT Pase GroEL Table View Raw Output XML Output Original Sequences SUBMIT ANOTHER JOB Please contact EBI Support with any problems or suggestions regarding this site amp View Printer friendly version of this page Terms of Use 80 Workshop on Bacterial Genomics Module 5 Genome Resources InterPro 1PR001844 Cpn60 Mozilla a Elle Edit View Go Bookmarks Tools Window Help Site Data
71. E http iow eb ac uk genomes bacteria html 4 X hil Back Reload Siop 7 aih Home x Bookmarks vena af Connections a Biz Journal af Smanupdate 4 909 087 4 791 961 ABOLEL3 Sinorhizobium meon 1021 000000000000 369405 ALMUS supw cxnaiuserndanuwsuMN Un 778 Iam MES 2 160 267 ADOMAS d M ossa 221488 AaS CON 2 030 21 MEUIIX 2 008615 AEQOTSIT 2 160 837 among con Prorome 1 900 521 014074 CON Proteome Sirepiococcus mogenes MONS 37 73 185 017 AE009348 CON CON Workshop on Bacterial Genomics Look Module 7 pHCM1 dna 221807 pHCM1 dna nhr 72 pHCM1 dna nin 88 pHCMI dna nsq 54542 pHCM1 embl 604646 pHCM1_vs_pR27 16313 pR27 dna 183480 pR27 embl 469177 File name N315 embl Files of type All Files Show hidden files and directories Module 3 Generating ACT comparison files using BLAST save the EMBL sequence EF in the Module 3 directory Last Modified 01 22 2004 10 44 21 AM 01 22 2004 10 44 23 AM 01 22 2004 10 4
72. History J gt Pathogen Sequencing Unit Go To i Search Simple 2 This page displays results of queries executed using the boolean query pages The page will remain empty until searches have been run Please note that a search combining boolean operators will retum multiple results files Choosen queries are first executed across all organisms within GeneDB and subsequently narrowed down to oaly retum data from the organisms of choice Once query results have been retrieved users combine results files ia one of three ways adding files together union identifying common results between files intersect or identifying unique results between files ubtraction Please note you must have cookies enabled in your browser in order for this page to work correctly The result files can either be viewed or downloaded as a FASTA file of DNA or protein sequences Query a 5 Result Download Size Proteins with a containing the keyword or intersect Proteins predicted to have GO component membrane 30608 lt I second view Download 438 PM Genes for pberghei pchabaudi malaria intersect Proteins with a product containing the keyword or phrase transporter intersect 5 06 08 4 X o gt Proteins predicted to have GO component membrane PM lt view Download 67 Proteins predicted to have between 8 and 14 transmembrane domains intersect Proteins predicted to have GO process transport
73. Oct FTP download TIGR TNT 2002 8 e conserved synteny with P kxowlesi Feedback Curator Technical and P yoelii All splice sites checked The Plasmodium genome resource PlasmoDB FIP site updated Welcome Trust Functional Genomics dam Functional Genomics Clicking on the link Status and Project Information will take you to pages that describe the status and background of protozoan genome projects and other genome projects at the WTSI Have a look at the Plasmodium projects for reference Gene Results List Sanger Institute QA loi Heke ls 1 te AOI of 33 remis beum latin meemin ami transporter protein ohake 1270 88M KO MALAPT 11 mpa tages polar ABC tampone patative anspor partatrve putative transporter protein ohndete 0077 4 0532 ABC tramper putative XI Masque pcr wr ae PET Mranapuwter P EDS falciparum P datas obs dips CDs Cos CDS FF ODS F jki ODS ATP dependent transporter putain cns LU merrsacsharkbe pubes P jakiren CDS MALLWPLZU0 phosphate transporter putative F igen CU MALJFPLZT F falciparum 016 oli Iransporier 2277 800125 F aliguam EDS Gat
74. PF07 0070 _P falciparum PF14_0679 _ Pchabaudi PC000560 00 0 _Pberghei 78000562 01 0 __Pherghei PB000168 03 0 Pfalciparum PF11 0141 Pchabaudi PC000144 02 0 Pchabaudi 001134 02 0 __Pherghei PB000S98 03 0 __Pberghei PBO000201 02 _P falciparum PFEOS2 Sw _P falciparum 118 Sw _P falciparum 14 55 _P falciparum PFFOS 50c __P falciparum MALSP1 32 _Pberghei PB000S69 02 0 _Pherghei PB000033 01 0 Information Required For Each RNA CDS DNA Unspliced sequence caracas Number of bases 20 Sequence Range distance 20 5 distance 20 _Submit Query Hosted by the Sanger Institute Send comments requests corrections and updates 138 Workshop on Bacterial Genomics Exercise 3 Search strategies using omniBLAST and browsing of the Pfam domain catalogue 28 We are now going to return to consider and run a few other search strategies which make use of the strengths of GeneDB 1 Use of a text keyword search across several organisms using the cross organism search page This can be a quick and powerful way to identify genes proteins in other organisms that perform very similar functions to your gene of interest This can be achieved with simple keywords and requires little previous knowledge about the gene of interest Once a gene or protein has been found that meets the keyword criteria e g sugar transporter the DNA or Protein seque
75. STY STY3l36 rpo UL 9 hyp STY2968 2972 prgI spe spaO ir invE STY3031 ih vgbJg 2 s i A OB Oh o ROOB di do Lh XA 8 N B 1 RO B FP D H G B D hb A A G Li t H A 8 M SG ROH CY X 4 v o BP OR OHOO QR OR n X B8 3 1 M 4 i78 R X 4X 9 4 HM Oo GTGAACACGTAATTCATTACGAAGTTTAATTCTTTGAGCATCAAACTTTT 57440 4257450 4257460 4257470 4257480 4257490 a257500 4257510 4257520 4257530 4257540 42575 CACTTGTGCATTAAGTAATGCTTCAAATTAAGAAACTCGTAGTTTGAAAATTTAACTTCTCAAACTAGTACCGAGTCTAACTTGCGACCGCCGTCCGGATTGTGTACGTTCAGC D P OV X NOM V WOL BOCK Lo Mon B K P Qo Lb T QD H4 Lh N f A P B 1 d L VH Ho WO ORO SE d E od E A A es 17 Workshop on Bacterial Genomics Module 1 Artemis Region 2 File Entries Select View Goto Edit Create Write Run Graph Display othing selecte Entry Ms_typhi dna 7S8 typhi tab GC Content Window size 9467 GC Deviation G C G 4C Window size 4100 Karlin Signature Difference Window size 3659 0 01 gt i EDD b hr ED m gt STY4337 yheM ri rp rpmc sm yrdB 5 4402 4405 STY4409 5 4418 STY4427 lp gt ED gt gt ED gt urgD 6 43 yhel t rpsJ rpl rpmD zntR aroE met 5 4403
76. Sequencing Unit After this time queries must Score Score Score Score Score Search Simple NI rieve result for id s2bE7871 01 82 05028926 retrieve Reak times your BLAST searches could take longer than normal Please Full Sequence Full Sequence Full Sequence Full Sequence Full Sequence Summary for B pertussis predicted protems wublastp for query BPP0417 Full BLAST Search Full Sequence Full Sequence Full Sequence Full Sequence Full Sequence Name Name Name Name Name Summary for B bronchiseptica predicted protems wublastp for query BPP0417 Full BLAST Search P N P N P N P N P N Name Name Name Name Name sphB 1 BP0529 sphB3 sphB2 BP0775 sphB 1 BB0452 0916 sphB2 BB0450 Score Score Score Score Score Score Score Score Score Score 4755 215 203 208 168 4827 233 226 223 218 DB S PN 0 P N 22e 17 P N 1 8 16 P N 2 4 15 P N 1 8 13 P N 0 N P N 1 6 18 P N 3 3 14 P N 6 1 14 P N 26 11 Retneve result for id 2 77 55 27239 reneve At peak times your BLAST searches could take longer than normal Please 0 7 8e 17 2 3e 16 2 2e 15 5 5e 15 be patient BLAST results kept our servers foc three days folowing query
77. Superclasses Complexes Protein Complexes Proteins Protein Complexes Comment The typical class aldolases of plants and animals have been throroughly studied Baldwin 8 Fructose 1 6 bisphosphate aldolases can be divided into two classes on the basis of their catalytic and structural properties Baldwin 8 Class fructose 1 6 bisphosphate aldolases were once thought to be confined to eukaryotic organisms but have since been de bacterial species Baldwin78a The occurence of such an aldolase in bacteria was unexpected in light ofthe a note of distribution of aldolases Stribling 3 The enzymes of eukaryotes generally fall into Class and are tetrame polypeptide chains Alefounder89a In earlier studies Stribling73 it was thought that the class pedl aldol the EC number that it was tetrameric with a mol wt of approx 140K In 1978 new purification techniques wereTsed The true a could be measured by using Fru 1 6 P that had been purified by chromatography op DEAE cellulose to remove the fructose 6 phosphate Using these methods the enzyme appeared be larger than was pfeviously supposed and may be a decamer with a mol wt of approx 340 000 The size of aldolase 1 and the effect oferdss linking reagents on it indicate that its structure must differ significantly from that of the typical tetrameric class l enzyer s from eukaryotes Baldwin 8a_ Stribling 3 1 Component Composition fructose bisphosphate aldol
78. The Wellcome Trust Sanger Institute gt Pathogen Sequencing Unit DB GoTo GeneDe Search Simple Help retrieve Your BLAST query has been added to the queue of jobs The majority of BLASTs are completed within two minutes Retrieve result for id s2bF q8107h096506239683 To retrieve your results click the retrieve button above or use the following URL http dev genedb org genedb2 blast getblast id s2bF X98 107h096508239683 101 Workshop on Bacterial Genomics DB GoTo GeneDB Blast Server Results Retrieve result for id 526219491 Search Simple retrieve At peak times your BLAST searches dould take longer than normal Please be patient BLAST results are kept on our servers foy three days following query submission Kesults may be retrieved any th ber of times during this period After this time queries must be resubmitted X further examination is required Summary for 5 pombe protems wublastp for query SPAC630 03 Name Name Name Name 2 30 03 arp3 SPBC3ZHS 120 act SEBCIS4T 12 2 10 08 alps score score SPACIIHII 06 arp2 Score score Score 2229 505 507 452 214 EN EN P N P N b 3e Z33 1 9 3e 59 WN 2 1 2 58 N 2 1 3 44 1 S 2 6 29 3 TA m m m mm Full BLAST Search Full Sequenke Full Sequenck
79. aldolase z2 dehvdro 3 deoxyphosphoheptonate aldolase protein complex 2 dehwvdro 3 deoxyphosphoheptonate aldolase protein complex protein complex 2 dehwvdro 3 deoxyphosphoheptonate aldolase protein complex protein complex keto 3 deox 6 phosphogluconate aldolase 3 deoxy L manna octulosonic acid S phosphate synthase 2 dehydro s deoxyonosonooctonate aldolase Z hydroxy 2 ketovalerate aldolase alpha dehydro beta deoxy D glucarate aldolase citrate lyase citrate aldolase deoxyribose phosphate aldolase dihydroneopterin aldolase select this fructose 5 phosphate aldolase 1 enzvme fructose 5 phosphate aldolase 2 fructose bisphosphate aldolase class fructose bisphosphate aldolase class Il fructose bisphosphate aldolase monomer polypeptide fructose bisphosphate aldolase monomer polypeptide alvcine hydroxymethyltransferase senne aldolase L alio threonine aldolase L fuculose phosphate aldolase H acetylneuraminate vase N acefvyineuraminic acid aldolase e protein with similarity ta HHED aldolases Internet 69 Workshop on Bacterial Genomics Module 5 Genome Resources fructose bisphosphate aldolase class Microsoft Internet Explorer File Edit View Favorites Tools Help Back a Search Favorites Address http biocyc org 1555 ECOLI NEW IMAGE Eype ENZ Y ME amp objecE FRUCBISALD CLASSI Links E coli K 12 Enzyme fructose bisphosphate aldolase class
80. amp HOMOLOGY Help Index Submission Form General Help Formats MPsrch is a biological sequence sequence comparison tool that implements the Gaps true Smith and Waterman algorithm It runs a search on a HP COMPAQ cluster using single and parallelised versions of the software It allows an rigorous search Matrix in a reasonable computational time MPsrch utilises an exhaustive algorithm which is recognised as the most sensitive sequence comparison method available whereas Blast and Fasta utilise a heuristic one As a consequence MPsrch is capable of identifying hits MPsrch Help in cases where Blast and Fasta fail and also reports fewer false positive hits References YOUR EMAIL SEARCH TITLE RESULTS DATABASE PROGRAM Database Information UniProt interactive v UniProt MPsrch pp gt UniParc TABLE GAPOPEN GAPEXTEND ANNOTATION SORT SUMMARY amp ALIGNMENTS no Edinburgh score TOTAL 20 Enter or Paste Protein Sequence any format GNDARVKMLRGVNVLADAVKVTLGPKGRNVVLDKSFGAPTIT ELEDK FENMGAQMVKEVASKANDAACDGTTTATVLAQAI ITEGLKAVAAC DKAVTAAVEELKALSVPCSDSKAIAQVGTISANSDETVGKLIAEAMDKVC GLODELDVVEGMQFDRGYLSPYFINKPETGAVELESPFILLADKKISNI AKAGKPLLITAEDVEGEALATAVVNTIRGIVKVAAVKAPGFGDRRKAMLC ISEEIGMELEKATLEDLGQAKRVVINKDTTTIIDGVGEEAAIQGRVAQIE DREKLQERVAKLAGGVAVIKVGAATEVEMKEKKARVEDALHATRAAVEEC RVASKLADLRGQNEDQNVGIKVALRAMEAPLRQIVLNCGEEPSVVANTVE XTEEYGNMIDMGILDPTKVTRSALQYAASVAGLMITTECMVTDLPKNDA
81. autotransporter parapertussis CDS BPP2678 autotransporter pseudogene Navigation bar pull down menues You can navigate between different organism datasets and search tools using pull down menus CDS BPP0417 Seorch tor GoTo JOrganisms Gene name and product information The ren description lines are standardized and indexed so Systematic Name that features sharing the same description lines Gene Synonyms sphE1 Product sunctransportersubtisn lice protease be retrieved Access to the nucleotide and amino Type CDS acid sequences of the feature are also provided Location Chromosome 1 Basic location information and context map Location complement 434671 437466 Length 2796 bp Exons 434671 437466 Spiced length 2796 bp Clicking on the Graphical display in Artemis Graphical Display in Artemis open up an Artemis applet which will be Context Map c L 425000 430000 435000 440000 445000 a discussed further in exercise 2 Via the applet the feature can be viewed in the context of the sequence and additional annotation No diescnplion ercsdaisie BPPO403 BPPOSO4 BPPOSOS 406 RPPOJ07 8 408 RPPO409 0410 EPPOS11 2 BPPOS13 BPPOSI4 BPPOSIS RPPO416 gt 417 lt 418 BP
82. by the Sanpa Cursor fecdback Technacal feedback The Wellcome Trust DEB GeneDB omniBLAST Server lt Sanger Institute 2 Pathogen Sequencing Unit Go To Organisms Go To Shortcuts OmniBLAST will perform BLAST search on set of protein databases BLASTP BLASTX depending on the query sequence or nucleotide databases BLASTN and TBLASTX or TBLASTN available in GeneDB and retum a list of the best five HSP for each database If there are any HSP you can click on Full Search to see the complete BLAST output To search individual databases with different parameters _Go To single organism BLAST Choose QUERY DATA Paste your sequence here gt 02485 transporter asta format or just plain text will do omologue putative Theileria annulata chr Determine sequence type automatically or set sequence type to C protein amet Note OmniBLAST searches may take several minutes depending on the number of selected databases Please check the databases chosen below are correct DATABASE OPTIONS Search only the BLAST databases selected below Jump down page gt Protozoa Bacteria Parasite Vectors A fumigatus A fumigatus finished sequences fumigatus predicted proteins A fumigatus predicted genes coding sequences A fumigatus ends N A fumigatus whole genome shotgun reads 35 Note th
83. c Similar to Bacteriophage P2 tail complet CDS 79322 79747 Similar to Bacteriophage P2 protein LysE CDS 79747 80124 no significant database hits Contains CDS 80129 80599 Similar to Serratia marcescens putative CDS 80619 80834 Similar to Serratia marcescens extracel Close The genes listed 6 are only those fitting your selection criterion They can be copied or moved in to a new entry so we can view them in isolation from the rest of the information within spi7 tab Firstly in window 6 select all of the CDS shown by clicking on the select menu and then selecting All All the features listed in window 6 should now be highlighted To copy them to another entry file click Edit then move selected Features then name Close the two smaller feature selector windows and return to the SPI 7 Artemis window You could rename the no name entry as you did before Temporarily remove the features contained in spi7 tab file by left clicking on the entry button on the grey entry line remain Only the phage genes should Workshop on Bacterial Genomics Module 1 Artemis Additional methods of selecting extracting features using the Feature Selector It is worth noting that the feature selector can be used in many other ways to select and extract subsets of features from the genome If you have a closer look at the Feature selector you will also see that you can use search t
84. entry as and then New file Another menu will ask you to choose one of the entries listed At this point they will both be called no name Left click on the top entry in the list A window will appear asking you to give this file a name Save this file as spi7 dna Do the same again for the other unnamed entry and save it as spi7 tab 3 x AL X File Entries Select View Goto Edit Create Write Run Graph Display Read An Entry strand 133814 133816 complement 1430 1432 Read An Entry From Database Read Entry Into z IC rs SRI 5000 1 1 MI Save An Entry As Neu File Hs Save All Entries EMBL Format Clone This Window GENBANK Format r STY4525 STY4 Close GFF Format 800 1600 EMBL Submission Format 5600 tRNA WEIN PPM Gs E P Ell DL HE LH E LL ral EK GRO IY Wo oC P 8 N WX D To GGG Ho Q D A B 5 S V H T N R T b IL Lb B N 5 1 B B b 8
85. in a later exercise in this module Exercise 1 Part III Take your web browser back to the IUBMB search page and search using EC2 7 7 60 as before For some enzymes you can also get pathway information from their IUBMB pages e g IUBMIB Enzyme Nomenclature EC 2 7 7 60 Common name 2 C methyl D erythritol 4 phosphate cytidylyltransterase Reaction CTP 2 C methyl D erythritel 4 phosphate diphosphate 4 cytidine 5 ciphospho 2 C methyl D erythrital For diagram click here Click here to get a Other name s cytidylyltransferase pathway diagram Systematic name CTP 2 C methyl D erythritel 4 phosphate cytidylyltransterase Comments The enzyme from Escherichia requires Mg or Mn or UTP can replace CTP but both are less effective GTP are not substrates Forms part of an alternative nonmevalonate pathway for terpenoid biosynthesis for diagram click here Links to other databases BRENDA EXPASY KEGG WIT CAS registry number References 1 Kohdich F Wungsintaweekul J Fellermeier Sagner 5 Herz Kis E Eisenreich W Bacher and zenk Cyticine 5 tphosphate dependent biosynthesis of isoprenoids YgbP protein of Escherichia catalyzes the formation of 4 diphosphocytidyl 2 C methyl D enthritol Proc Nai Acad sen USA 96 1999 11758 11763 Medline UL 10518523 2 Kuzuyama T Takaa ML E Dar T and Seto Formation of
86. interface supports a wide range of queries on sequences and curated annotations stored in the relational database GUS Searches can be combined with the boolean operators AND and OR For example users can select all proteins of a specified length range with a specified number of introns Other query options include GO assignments keywords chromosome protein domains and predicted protein sequence features The queries in each session are tracked via a history page allowing further refinement of searches and downloading of results as a nucleotide or amino acid FASTA file This exercise will demonstrate how to combine build up queries to retrieve a subset of predicted ABC transporters containing 8 transmembrane domains DB Trypanosoma brucei GeneDB DB WARNING 5th September 2005 The Full Content Search is still pointing at the previous data run s data This will be fixed later the day GeneDB contains release 4 of the brucei genome strain TREU927 4 GUTat10 1 generated by the 7 brucei projects at The Institute for Genomic Research TIGR s T brucei project and The Wellcome Trust Sanger Institute Sanger s 7 brucei project It also contains the sequence and annotation of 3 T brucei strain 427 variant 221a bloodstream expression sites PMID Click here for more information on the 7 brucei genome proteome and here to find out more about the individual chromosome assemblies in particular with regards to additional unord
87. jp form html http ecocyc org http www expasy ch enzyme Kyoto Encyclopedia of Genes and Genomes KEGG http www genome ad jp kegg MetaCyc Miscellaneous sites NCBI BLAST website The tmRNA website tRNAscan SE Search Server Codon usage database RNAgenie RNA gene prediction GO Gene Ontology Consortium Artemis homepage ACT homepage Glimmer Orpheus http ecocyc org http www ncbi nlm nih gov BLAST http www indiana edu tmrna http www genetics wustl edu eddy tRNAscan SE http www kazusa or jp codon http rnagene Ibl gov http www geneontology org http www sanger ac uk Software Artemis http www sanger ac uk Software ACT http www tigr org software glimmer http pedant gsf de orpheus 156 Workshop on Bacterial Genomics Appendices Appendix VI Prokaryotic Protein Classification Scheme used within the PSU This scheme was adapted for in house use from the Monica Riley s protein classification lt http genprotec mbl edu riley lab html gt More classes can be added depending on the microorganism that is being annotated e g secondary metabolites sigma factors ECF or non ECF etc 0 0 0 Unknown function no known homologs 0 0 1 Conserved in Escherichia coli 0 0 2 Conserved in organism other than Escherichia coli 1 0 0 Cell processes 1 4 0 Protection responses 1 1 1 Chemotaxis and mobility 1 4 1 Cell killing 1 2 1 Chromosome replication 1 4 2 Detoxification 1 3 1 Chaperones 1 4
88. not featured in the exercises in this manual Like all the Modules in this workshop the key is if you don t understand please ask Workshop on Bacterial Genomics Module 1 Artemis Artemis Exercise 1 Part I 1 Starting up the Artemis software Navigate your way into the correct directory for this module Then type art amp return A small start up window will appear see below Now follow the sequence of numbers to load up the Salmonella typhi chromosome sequence Ask a demonstrator for help if you have any problems E Artemis Release 5 beta ES Click File File Options then Open Open 4 Open from Quit Copyright 1998 2002 Genome Research Limited eta mode For simplicity it is a good idea to In the Options menu open a new start up window for you can switch between each Artemis session and close prokaryotic and down any sessions once you have eukaryotic mode finished an exercise Select an file Enter path or folder name s palthogen Advanced workshop Medule 1 Artemis Filter Folders Malaria PF tab jM 8 typhi Single click STDS to select DNA file Enter file name DNA sequence files will have the suffix dna Annotation files end with tab Single click to open file in Artemis then wait Workshop on Bacterial Genomics Module 1 Artemis 2 Loading annotation files entries into Artemis
89. numbers are designated above You will need to click on more files to upload more than 2 sequences and the comparison flies Click on after you have uploaded all the files 44 Workshop on Bacterial Genomics Module 2 Comparative Genomics 6 0 Click here to load more files and select the appropriate file T nfs disk222 veastpub ang oe INED mm Click on here to read all the files that you have selected 8 d ap Close CDS can t have psu domain as a qualifier ignore error and continue eal 45 Workshop on Bacterial Genomics Module 2 Comparative Genomics Can you see any conserved gene order between the A fumigatus amp A nidulans in the qut gene cluster Can you obtain a clearer picture of the ACT 4 way comparison figure by filtering out the low scoring segments using the blast score cut off feature which you have used previously Zoom in and look at some of the genes encoded within theses regions View the details by clicking on the feature and then select Edit selected feature from the Edit menu after selecting the appropriate CDS feature By comparing the blast similarity matches assign your own annotation gene product to the predicted gene models the blue genes on the P anserina gene model file Can you identify any gene NOT presen
90. pU 399 413300 4427400 4411500 455600 4469708 483000 491900 512000 26100 0200 4524300 I misc feature isc misc feature peat unit unit misc feature fe misc feature eat uni misc feature misc iMm featu 51144 STY4492 umB 5 STY4513 pilN pilv2 5714585 51 STY4604 14618 31714641 ins STY 5114675 ecnR 3 y3eS 43 03H GENI AM 4 4949 mq 9H yj 3114466 493 8 4500 3114516 3114500 STY4 STY 5714622 14632 30 insB iB 8714674 301 j frdD Rd ad M q 44 4 4 Wd of 49 87 4487 449 87 4582 STY4S 5 1614 3 TY4633 37 exe tviA 81 STY46 STY4686 YOR H C V Q F LR DOPTA T N P P D 7 Wb Q 5 0 FL ROS h W 1 A 9 Q 83 K HM A L K hb PP PARDO ese Ea AE SIR RORIS B0 18 8 DON 8 B B WR GOL OD S P E A V 8 8S R 5 6 E Q P P F E L E E e Lu ee V E c K 8 EQ A P 2 EC TA ARGCIRACACACLTCMAMGAMSTCIAGGCTICRULTIUGLGUCICTARAGCMATCACIGACUTCACTCTIATCAAMAGICAALCHARACTTRACUGACITCALTCIGITIOITGDRUGACTTTUACHLCUTD 00 0 0 QS K To 8 C H C Y MN R R E S I A L amp LNUBuCOACRCOLSS 0 Artemis Exercise 1 Part IV 19 Workshop on Bacterial Genomics Module 1 Artemis changed Note the bases have been renumbered from the first base you selected Note the entry names have File Entries Select Vieu Goto Edit reste Weite An Crah Display Entry Mg typhi dna QE ern
91. pour sequence here gt 1366 prn pertactin precursar Eordetella fasta format or gast plain text will do CAOVBRTATOTTIEVSOROACETPLLENP AEELRF ORES TSP DEGVUREFLOGTVTUKA Tetine sequence automatically C ar cetsequence type to DHA C W Nota Dent ALAST searches mag irka several mientas depezdiey on the modes af selected Please check the databaser chasan below are carreri DATABASE OPTIONS Search only the BELAST databases selected below Pump down page Fangi pobozos fumi gatus D kanina mahad anmannasa 8 B edited mete E brenhin E hbroneisepnesg complete sequence N parapertussis B parapertussis complete sequence N pertussis complete sequence N C diphtheriae oC predicted penes coding zequencez N B pseudomallei predicted genet coding sequences IN E earotovorg subro atnoseptiea V E carotovera predeted genes coding sequences N 5 aunts subsp stein MES AZSZ OE aureus MESA predited genes sequencer N anmes subsp aureus MESA 5 MSSA predicted genes coding sequences N 5 coelicolor predcbed genes sequences N phi chromoromal zequence OA pph pHCMI sequence amp phi pHOCMS sequence morsitams morais chastered ESTs N broneinseptiea predicted
92. proteins P E B parapertuerir predcted protine P OB portuense predicted protens P predicted proteins predicted protens P 4 carcipvora predicted peobems P E aureus MESA aureus MESA protens amp prolem F pph chromosomal F FS bpm P pph pHCM2 Gorfratans clustered ESTs More Gami BLAST searches may take mimus deprmbixr am the member af selected databaser DB iequences P OmniBLAST Server Submission Go To GeneDB Search Simple 125 Retrieve result for id s2dPAI71A78Cm959999725 retrieve Your BLAST query has been added to the queue of jobs The majority of BLASTs are completed within two minutes To retrieve your results click the retrieve button above or use the following URL http www genedb org genedb2 blast getblast id s2dP AT 71A78Cm959999725 Workshop on Bacterial Genomics Summary for B parapertussis predicted protems wublastp for query BPP0417 Full BLAST Search 4845 236 226 218 197 Name Name Name Name Name sphB 1 BPP0452 BPP0822 BPP0449 sphB3 BLAST results are k p amp gn our servers for three days following query submission Results may be retweved any number of times during this period Blast Server Results The Wellcome Trust Sanger Institute gt Pathogen
93. s Bar Friday 30th September 09 00 10 30 Comparative genomics Francisco Silva and Amparo Latorre 10 30 11 00 C offee 11 00 12 30 Comparative genomics cont d Francisco Silva and Amparo Latore 12 30 14 00 Phylogenomics Femando Gonz lez and Rosario Gil 14 00 15 00 Lunch 15 00 16 30 Phylogenomics cont d Femando Gonz lez and Rosario Gil 16 30 17 30 Genome flexibility Alex Mira 17 30 19 30 Annotation summary exercise orown seq Mop up m Workshop on Bacterial Genomics Glossary of Abbreviations and Terms ACT BLAST CDS CNRS DDBJ EBI EMBL EST Fasta Flatfile GENE IT HMM INRA InterPro LINUX mRNA NCBI PRINTS PFAM ProDom Prosite PSU RFAM SIB SignalP SMART SWISS PROT TIGR TIGRfam TMHMM TrEMBL UNIX Artemis Comparison Tool Basic local alignment search tool Coding sequence Gene with no biological evidence for expression Centre National De La Recherche Scientifique DNA data bank of Japan European Bioinformatics Institute Hinxton An outstation of the European Molecular Biology Laboratory European Molecular Biology Laboratory the name of the European DNA database Expressed sequence tag Part of the Fast repertoire of global alignment search tools A simple text file used as an alternative to a database to storing data Is a company that collaborates with the EBI and others to discover the functions of genes through comparative genomics Hidden Markov Model
94. see how well you have done turn back on the spi7 tab and have a look at the genes located at either side of your selection Go to and look at the CDS 5 In reality this gene was disrupted by the insertion of this bacteriophage If you look at the FASTA results for this CDS you may be able to track the bases between which this phage inserted Your final task 1s to write out these files in EMBL format and create a merged annotation and sequence file in EMBL format File Entries Select View Goto Edit Create Write Run Graph Display Read Entry Read An Entry From Database Click File then Save An Entry As 36 prophage colour 0 label prophage tab name Read Entry Into e P X Save Default Entru 5 STY4595 cTY4627 Save An Entry 7 pave ies File Save All Entries Clone This Mindg EMBL Format GENBANK Format GFF Format 10 45500 52000 58501 EMBL Subg EMBL Format misc feature 4 C 494 qd 4 ST 4596 Y ST pin 5 4614 19 ST STY mise feature L STY4585 sTYd6 5 4604 STY4616 33 Oat Oo STY4580 sa STY4601 STYd610 5 4622 STY STY Select a file to save BOO UD G I WC gt PE c G N KB I 5 gt D AAAGCAGGACGCTGCACTGGCATTATCGTCTGTCCATACCGGACTGACTGTAGCGGACAACA
95. simultaneously see below Welle Tris S DA x Sanger lnxtitule rs i GoTo Thus page pou 12 more comple queri agunt the pniti quam forma and boolean operon AMD Fon 1 enzarimart a query bngn on Then the pulldown to pileri turo proceed i2 react Lo a pags Lond ow peu ta purgmeter tha prancing query a chata quere 1 AMD query 2 princi 4 pallorem brite thee preted bien deed 1e beck up a the eck You are currenily searching T Veri ano anni a preted M aar vang ial lenge arn m Thee D rusa DE Resula i 2 2 Sanger institute oM Feier teye Fag Led 02 256 0 hi drgania description 1 POFA TAZIA AHJO pema onn ais aliens 10 Wangen pidainee
96. stable RNA s 2 Atypical G C contents 3 Carry virulence related functions 4 Often carry genes encoding transposase or integrase like proteins 5 Unstable and self mobilisable 6 Of limited phylogenetic distribution Have a look in and around this region and look for some of these features Region 1 SPI 1 11 is Entry t dna 0x File Entries Select View Goto Edit Create Write Run Graph Display ne selecte ase on forward strand Entry S typhi dna 8 typhi tab GC Content Window size 1840 GC Deviation G C G 4C Window size 1828 Karlin Signature Difference Window size 6385 DE 8 2958 8 297 sit sitD iagB STY3027 0 STY3037 STY3D b 8 2960 72963 hypC 29 8 2986 STY2996 i STY STY3033 ST STY O b STY2962 STY2976 1 sitB STY3025 STY3044 o DD 5 2 pp RBS 2 RBS RBS SPI 1 eature mis RBS ture misc misc featur misc fea 2834000 2840500 2853500 2860000 2866500 2873000 _ A 2892500 2899000 2905500 2912000 Kl 4 44 K 4 K 444 4 4 K 4 RBS RBS RBS feat RBS m RBS 2BS feature feature mis misc feature ature RBS f misc featui q a S Ua d hyd ST 2970 Ty2974 5 2982 prgk st STY3003 sp spaQ spak invi STY30 STY3034 STY STY3045 qua 4 4441 Uc amp 5 2961 SGTY29601 1 6 2 sp STY2989 51 gt spaPiN I invG
97. the boolean operators AND and OR For example to construct a simple query begin by clicking on the AND button Then use the pulldown menus to select Nro different queries Finally click on proceed to next step to generate a page that will allow you to specify parameters for the two queries Running the resulting boolean query will only those objects that satisfy query 1 AND query 2 Please note you must select query from each pulldown menu before the proceed button will work If you need to back up a st use the browser s back button You are currently searching T brucei Choose a different organism i Protein containing one or more predicted transmembrane domains Minimum number of transmembrane domains an integer gt 0 Maximum number of transmembrane domains B aninteger gt 0 H Query options Rows per page 20 WB Run query 117 Workshop on Bacterial Genomics 10 1 20 21 39 1 20021 39 nare ld organism deseriptian 1 T brat Igpotheteal proren conserved Mamad 2 T bruci protem Annotation 3 T brucei Egsghodpghoglycan 2 puta Annotation 4 T brucs rezataese E Annotation 5 T brace hypothetical protein Heuy Annetasen amp T bruce chaperone putate Manual Annotation 7 T bruni protein Manual Annotation 3 T brueet calemes motre p type ATPase patate 3 T
98. the following meanings in order score percent identity match start 1n the query sequence match end in the query sequence query sequence name subject sequence start subject sequence end subject sequence name The columns should be separated by single spaces 153 Workshop on Bacterial Genomics Appendices Appendix III Feature Keys and Qualifiers a brief explanation of what they are and a sample of the one s we use Feature Keys They describe features with DNA coordinates and once marked they all appear 1n the Artemis main window The ones we use are CDS Marks the extent of the coding sequence RBS Ribosomal binding site gt misc feature Miscellaneous feature in the DNA rRNA Ribosomal RNA repeat region repeat unit gt stem loop tRNA Transfer RNA 2 Qualifiers They describe features with protein coordinates Once marked they appear in the lower part of the Artemis window They describe the gene whose coordinates appear in the location part of the editing window The ones we commonly use for annotation at the Sanger Institute are Class Classification scheme we use in house developed from Monica Riley s MultiFun assignments see Appendix VI gt Colour Also used in house in order to differentiate between different types of genes and other features gt Gene This qualifier either gives the gene a name or a systematic gene number Label Allows
99. to look at a region of synteny between T brucei and Leishmania Aim By looking at a comparison of the annotated sequences of T brucei and L major you will be able to analyse detail those genes that are found in both organisms as well as spot the differences You will also see how act can be used to study the different chromosome architecture of these two parasite species The files that you are going to need are Tbrucei dna T brucei sequence Tbrucei embl T brucei annotation Leish vs Tbrucei tblastx comparison file Leish dna L major sequence Leish embl L major annotation First load up the sequence files for T brucei and L major and the comparison file in ACT _ File Entries Select View Goto Edit Create Write Run Graph Display 2200 4400 6600 m 13200 15400 17600 19800 IN nn MA n n if i LOCKED WL ul 2200 4400 6600 8800 11000 13200 15400 17600 19800 Ab P rm uw o P P a 41 Workshop on Bacterial Genomics Module 2 Comparative Genomics Zoom out amp switch off stop codon to m 5 clarify the SE 116800 175200 315381 315097 gt 80532 80815 amp score 57 percent id 37 T 233600 292000 350400 408800 467200 shape indicates F3 an hour glass 1 inversion
100. uk interproISpy ipr IPROO1844 amp mode detalled stricturl on Q Q Q _ 2c uk interpro Spy ipr IP R001 8448 mode detaileda structural on Q Search ex PTHR11353 a12912 Ou c 8 52 e dlass__ d 56 1 2 e dla6da3 PTHR11353 T 7 4 4 amp Home Bookmarks Google S Library S SSC S SSC dev S WIKI S PSORT Pfam S SRS7 S FTP lt E coli S GeneDB lt Entrez PubMed MM on EETEE EEE TELE EEE EEE EE EEE EEE EET TE EEE TEE EE EE EEE EE EEE ETE ow rcp C pn60 TCPI ar ar ar ar Ar Am Aw Am Am Am Am MM MMMM am am am 66 am TMM MMMM MMMM MM 1235153 CATH Domain 1a6dA1 403 519 TCOMPLEXTCP1 Cpn60 TCP 1 GroEL ATPase thermosome_arch ram am ar ar ar ar Am Am Am m CHAPERONIN6O 1 a a a a 2 FTP E coli GeneDB lt Entrez PubMed u EEEE EEEE EEE EEE EP EEEEEE EEE TET Tis es e TCOMPLEXTCP1 60_ 1 Cpn60 TCP 1 GroEL ATPase
101. use of the artemis applet It will only be considered briefly since you have already covered the use of Artemis It can be launched from within GeneDB and is a useful way of viewing the gene in the context of the genome It is especially useful for visualising intergenic regions promoters 5 and 3 untranslated regions intron exon boundaries as well as many other features HERES ue 8 zi Search tr oats Lamia potato JJTIGR Sequence and annotation provided by gene at TIGR 4l resem cs ocu Basket atomic PEBO ii noche transporter putative 0 CDs Prosci Chromoseine 1 i hromesenmr Location complemen SOAR 307403 Lengity 8515 Exon complement 205889 207403 Spliced lengMlf 515 bp Context Map FEROS FEROS PERTH 4 Amine Charge Signal l rptide Found Tranamcnmberame 12 probable iramancmiranc helica prodacted Por PETRO Hc by 20 at 20 42 79 101 109 125 150 152 164 186 206 228 293 313 330 332 359 378 ELI T ple Edt View Go Bookmarks Tools Window Help 49 This window allows you to specify the region that 15 opened by the viewer The de
102. you to label a gene feature in the main view panel Note This qualifier allows for the inclusion of free text This could be a description of the evidence supporting the functional prediction or other notable features information which cannot be described using other qualifiers Partial When a region in the DNA hits a protein in the database but lacks start and or stop codons and the match does not include the whole length of the protein it can be considered as a partial gene Product The assigned possible function for the protein goes here Pseudo Matches in different frames to consecutive segments of the same protein in the databases can be linked or joined as one and edited in one window They are marked as pseudogenes They are normally not functional and are considered to have been mutated The list of keys and qualifiers accepted by EMBL in sequence annotation submission files are list at the following web page http www3 ebi ac uk Services WebFeat 154 Workshop on Bacterial Genomics Appendices Appendix IV Schematic of workshop files and directories Key Directories and subdirectories Module 1 Artemis Module 2 Comparative genomics Module 3 Generating ACT comparisons Module 4 Jemboss Home directory position at login Module 5 Genome Resources Module 6 Data mining Own sequences 155 Workshop on Bacterial Genomics Appendix V Useful Web addresses Major Public Sequence Repositor
103. 01 VIGSSIKTMPLAGRDVTYFVQSLLRDRNEP DSSLKTAERIKEECCYVCPDIVKEFSR 257 Preir Mag V PLAGRD T D L A IKE Ye I Shjot 182 VMHNAIQHIPLAGRDITHFVLEWLRERGEPVPADDALYLAQHIKEKYCYIARNIAREFET 241 Query 258 FDRE PDRYLKY ASESITGHSTTIDVGFERFLAPEIFFNPEIASSDELTPLPELVDNW 315 B ake BH HAT DP KA PLPHHD Shjot 242 YDSDLPNHITKHHAVNRKTGESYTVDVGYEKFLGPEMFFSPDIFSREWTLPLPDVIDKAI 301 Query 316 SSPIDVRKGLYKNIVLSGGSTLFKNFGNRLQRDLKRIVDERIHRSEMLSGAKSGUV D 373 4 410 Score 88 0 S PID Rt P MD R 0 40 Deed Eom ber SUAT PRINTS Fim Shjot 302 WSCPIDCRRPLYRHVVLSGGTTHFPKFDKRLQKDLRALVSRRAKKFTKALGDPSKQITYD 361 14 11 BA N 2e Lj and 150 77 pen Query 374 VNVISHKRORNAVHFGGSLLAQTPEF GSYCHTKADYEEYGASIARRYQIF 423 Bead Boome 3 VIV H RQR AVW GGS L P P TK Y E G RR F Shjct 362 VNVVAMERQRYAVWYGOSMLGMSPDFAAVAKTKQEYDEHGPYVCRRWMME 411 prum unatzird Score 126 49 4 hits Expect 2 7e 06 P 2 7e 06 Identities 24 59 40 Positives 34 59 5 Eon nanio HSP Sequence Query 8 IDMDNGTGYSKLGYAGNDAPSYVEPTVIATRSAGAS SGPAVSSKPSYMASKGSGNLSS 65 HHDNGTGYSReGYAGIRS PHY PT A A 5 HSS Sbjc 6 WIDNGTGYTRNGYAGNEEPTYIIPTAYADNEASRRRSHDVE SDLDFYVGDEALAHSSS 64 Tnt predicied by cog der predeed cog dem Family Cicer hele HOE our Database miria DE
104. 04062 680 p20 Arc Tb927 2 PF05856 2900 p16 Arc Tb10 406 PF04699 0320 S During this exercise you will have become familiar with GeneDB the way data are displayed on feature pages and the various ways data can be accessed As you will have seen you wouldn t have been able to retrieve all the data by just using a single approach to mine the genomes but that instead you needed to employ multiple search strategies You will also have seen how to compile lists of genes of interest and how to download them for further examination experimentation Lastly with the increasing emphasis on comparative genomics you hopefully saw how GeneDB allows you to easily retrieve genes from related organisms 109 Workshop on Bacterial Genomics Exercise2 Use of the Artemis Applet FEC En or pulled a rias piepie me ms DA ae T 15744 ITEM Dm 103 bre cm p C end reg eal LED auras ciue abbr FT2 rper ME 2 saei T UC quum pecan 1 DONI Fegesla 1720 wi gp Casp ul Mi Tigheara Ir deri fT EXT egiri hir FAT Jade 13 23 eure ES 3 56 08 38 1 f andy gana OLEO ae 81
105. 1 17 21 04 2002 21 17 09 04 2003 21 17 16 10 2002 10 40 02 04 2003 21 34 09 04 2003 21 19 07 04 2003 17 58 14 01 2000 15 11 28 03 2003 17 33 12 12 2001 213 21 04 2001 21 14 21 04 2003 21 18 NCBI IData c sblast data 163 Once downloaded view the contents of the blast directory by clicking on the open folder button blast 2 2 6 1a32 win32 exe is a compressed file that contains a host of other files Included in the directory that has now been unpacked are several README files that describe the various programs in the BLAST software package These files also provide descriptions of the command line options that you can set when you run the programs To read these files double click on the icon or view them in notepad The README BLS file contains details of the main BLAST program and how to format DNA sequences prior to running BLAST Workshop on Bacterial Genomics NCBI Data C blast data EET addins Ms 3 AppPatch Documents Config Cy mui 22 Connection Wizard Cursors yPrefetch Debug RegisteredPackages Driver Cache ty Registration repair Fonts Resources SchCache security shellNew wr Medi Cysrchasst srchass i My Computer _ M My Networ
106. 169N the ability to transport fructose 15 abolished but the ability to transport glucose 15 retained This residue exist within a the 5 th predicted transmembrane helix We ll use the artimis applet to Look at the annotation for pfHT 1 systematic identifier 0210 View hydrophobicity hydrophilicity plots for the protein Examine the amino acid sequence around position 169 w Artemis Entry Edit MAL2 embl amp organism malaria amp from 195889440 2 17403 amp featTypes CDS amp linkTypesnull amp notesnull locscomp X BRE ERN EE MEI Use these scroll Nothing selected bars to adjust the Entry protein IINE Views WIW OM BER M E bo rao CDS WE 3 DM E A EIL D MI 00 101 011 101 CDS ETPXESDCEPDUECTSBSSOESRURTEUERSE ISESDTDSESECQSESLSCLDZESPE P a e ey a fl ea ee a ee a es at eed ee a a a a ee D po ko ko po h20 28 oe
107. 18 and a community acquired MRSA strain MW2 BA000033 uS Sc Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST Downloading the S aureus genomic sequences v Ronda AW PEERS Mozilla View Go Bookmarks Toos Window Help E T TL 207 2 DE mmea eac dgenemesitactsa hmi e 1 af wetial Connections af Biz journal af Smanupdate INTEL raa rU Bacteria Eukaryota Organelle EEO 28 eee See ae p Phage ib Agrobacterium tumefaciens sir 58 Cereon chromosome linear 187 gars 2400478 E Plasmid 2 Agrehacerium tumstasiems 5 U Washington chromosome circular 256 imm amm 7 zh Pgrobactemum tumefaciens str CSE U Washington chromnsomee linear Ley 95D Was accus ee le RUE 1 551 395 CON Prowome 5 227 293 10 Bordetella bronchiseptica strain RESO 18 gars 5339179 marmo CON Proteome 11 PA 1 Genomes Pages Bacteria Mozilla Bile View Go Bookmarks Tools Window Help m a 1
108. 193 6 60 0005200 structural constituent ae 21 P rl n 21 process obsolete AA cess 1 Ha obsolete ponent obsolet ren 1 80005 m d icr ndi 005019 furit Quen 97 Workshop on Bacterial Genomics 3 Last updated 2005 08 28 structural constituent of cytoskeleton Accession GO 0005200 Aspect molecular function Synonyms None Definition The action of a molecule that contributes to the structural integrity of a cytoskeletal structure Term Lineage all all 3754 GO 0003674 molecular function 3530 GO 0005198 structural molecule activity 193 GO 0005200 structural constituent of cytoskeleton 21 External References InterPro 3 Pfam 2 PRINTS 1 PROSITE 2 SP KWW 1 Gene Product Assog tjons Get ALL associations here Associations With Terms Submit Query Datasource Evidence Code Species S avermitilis a 560 All Curator Approved S MGI zi Submit Query Gene Symbol Type Datasource Evidence Full Name nstituent of cytoskeleton ri actin like protein 3 putative r3 GeneDB Tb actin putative GeneDB Tbrucei actin A 15092110630 gene GeneDB Tbrucei IEA actinA 1510610500 gene GeneDB_Tbrucei IEA actin like protein 2 putative T Tb10 705830 gene GeneDB_Tbrucei IEA actin like protein putative Pp Tb1101 1870
109. 27 obsolete molecular function obsolete molecular function 773 A aeolic us A fulgidus FlyBase SGD All Curator Approved Here s an expanded view of biological process and cellular component El all all 219618 1 GO 0008150 biological process 145845 GO 0007610 behavior 4698 GO 0000004 biological process unknown 36389 H GO 0009987 cellular process 91982 4200072 75 develonmen B2 35 30 0040007 growth 3622 b GO 0007582 physiological process 97174 GO 0043473 pigmentation 174 0050789 regulation of process 19190 30 000000 reproduction 5336 2467717016032 viral life cycle 306 0 0005575 cellular component 131834 30 0005623 cell 96315 06G0 0008372 cellular component unknown 28690 GO 0031012 extracellular matrix 1062 H G GO 0005576 extracellular region 9945 0043226 organelle 66478 b GO 0043234 protein complex 13971 6 30 0019012 virion 133 GO 0003674 molecular function 152664 obsolete biological process obsolete biological pracess 68 obsolete cellular component obsolete cellu ar component 27 obsolete molecular function obsolete molecular function 773 The numbers indicate the number of genes with this function 5 Workshop on Bacterial Genomics
110. 3 Drug analog sensitivity 1 4 4 Radiation sensitivity 1 5 0 Transport binding proteins 1 6 0 Adaptation 1 5 1 Amino acids and amines 1 6 1 Adaptations atypical conditions 1 5 2 Cations 1 6 2 Osmotic adaptation 1 5 3 Carbohydrates organic acids and alcohols 1 6 3 Fe storage 1 5 4 Anions 1 5 5 Other 1 7 1 Cell division 2 0 0 Macromolecule metabolism 2 1 0 Macromolecule degradation 2 1 1 Degradation of DNA 2 1 3 Degradation of polysaccharides 2 1 2 Degradation of RNA 2 1 4 Degradation of proteins peptides glycoproteins 2 2 0 Macromolecule synthesis modification 2 2 01 Amino acyl tRNA synthesis tRNA modification 2 2 07 Phospholipids 2 2 02 Basic proteins synthesis modification 2 2 08 Polysaccharides cytoplasmic 2 2 03 DNA replication repair restriction modification 2 2 09 Protein modification 2 2 04 Glycoprotein 2 2 10 Proteins translation and modification 2 2 05 Lipopolysaccharide 2 2 11 RNA synthesis modif DNA transcrip 2 2 06 Lipoprotein 2 2 12 tRNA 3 0 0 Metabolism of small molecules 3 1 0 Amino acid biosynthesis 3 1 01 Alanine 3 1 08 Glutamine 3 1 15 Phenylalanine 3 1 02 Arginine 3 1 09 Glycine 3 1 16 Proline 3 1 03 Asparagine 3 1 10 Histidine 3 1 17 Serine 3 1 04 Aspartate 3 1 11 Isoleucine 3 1 18 Threonine 3 1 05 Chorismate 3 1 12 Leucine 3 1 19 Tryptophan 3 1 06 Cysteine 3 1 13 Lysine 3 1 20 Tyrosine 3 1 07 Glutamate 3 1 14 Methionine 3 1 21 Valine 3 2 0 Biosynthesis of cofactors carriers 3 2 01 Acyl carrier p
111. 4 1 D X Ro MORE G XL I A d 8 Y Po M B L NH X D H KH 1 NK I 8 X H B X G D AGAGATTACGTCTGGTTGCAAGAGATCATAACAGGGGAAATTGATTGAAAATAAATATATCGCCAGCAGCACATGAACAAGTTTCGGAATGTGATC 10 20 30 40 50 60 70 80 90 TCTCTAATGCAGACCAACGTTCTCTAGTATTGTCCCCTTTAACTAACTTTTATTTATATAGCGGTCGTCGTGTACTTGTTCAAAGCCTTACACTAG l SET Gg dX 6 ee eM V PS B Q E Y I 4 B V C M P LN RT v 1 y D P Q L L H Y P X NI SE L Y I E L bL V H V HE K P T H D L HN ug RB WA Lo 5 L WE I D C A A C B C B c H 8 1 A misc_feature 2188349 2199512 e ow c Base composition 37 8 G C CDS 2188394 2189107 c Unknown function Contains possible N terminal signal sequen CDS 2189209 2189652 c Unknown function Contains probable N terminal signal seque CDS 2189768 2190217 Unknown function CDS 2190285 2190764 Unknown function Contains possible N terminal signal sequen RBS 2190771 2190775 possible RBS CDS 2190874 2191476 c Unknown function Contains possible N terminal signal seque CDS 2191545 2191823 Unknown function ll CDS 2191793 2192488 c Unknown function Z CDS 2192559 2193059 c Similar to Neisseria meningitidis hypothetical protein NMBO4R 11 Workshop on Bacterial Genomics Module 1 Artemis Once you have found this region have a look at some of the information th
112. 4 25 AM 01 22 2004 10 44 27 AM 01 22 2004 10 44 10 AM 01 22 2004 10 44 29 AM 01 22 2004 10 44 17 AM 01 22 2004 10 44 12 AM Save Cancel Save the file as N315 embl Repeat for the S aureus MW2 genome BA000033 Be careful when choosing the genome to download as there is another S aureus genome entry for strain Mu50 000017 Save as MW2 embl Generate DNA files in FASTA format using Artemis for both the genome sequences as previously done in exercise 1 Hint In Artemis each genome requires a separate Artemis window select Write Write Bases FASTA format save the DNA sequences as N315 dna and MW2 dna for the respective genomes Running Blast In the previous exercise you used the blastall program to run BlastN on two plasmid sequences As the genome sequences are larger 2 8 Mb you are going to run megablast another program from the NCBI Blast distribution that can generate comparison files in a format that ACT can read see appendix II For a detailed description of the uses and options in megablast see the megablast README file in the Blast software directory appendix X Like Blast megablast requires that one sequence 15 designated as a database sequence and the other the query sequence Therefore one of the sequences has to be formatted so that Blast recognises it as a database sequence This can be done as before using formatdb Ja Workshop on Bacterial Genomics Module 3 G
113. 4 fe bts rene ntslepn Sonar iion Collar Component EX Merger Fetes tie BEL 0538 Seach or EC Drm ave EA DESI A Dec 08 Crone Eslerrzer den a Pheighogeerats inam EC 2713 Esas 5C 27 2 3 C1 DITE ROT Fur EELONOS TC TESI ZECOBD OF OL STEP riim ATF ADP 3 FEOSFEATE ah Uni Lorenz Tram Mtge Geely See polart tebre L5 a hanpa odorata a Than om be n pr w beg wr p E noel a nci Zu in roma chan eaa dps pHi TH 1 BLOGO mend ed 110 Workshop on Bacterial Genomics The range download page allows you Location complemseat 21054 233824 to define the range of sequence you d co tm like to download in either EMBL or FASTA format or alternatively GeneDE Range Download Page General
114. 504 233826 Organism brucei Type CDS Contig 759271 Length 1064672 bp Up Down Stream GeneDB Range Download Page General Information Range Options Upstream 0000 Downstream fi 0000 From To From 222504 6 Download feabares and sequence EMBL format C Download sequence m Fasta format C Artemis Applet Help Artemis To 243825 Download Types Please note the Java applet doesn t work on all browse OS combinations doesn t nan for you please install locally Save n s teanB1 chf tigr Filter Directories chr3 chr5 chr6 chr Format for Saved Document Selection test Files dribble 1113 embl dribble 27P2 embl dribble 3K10 embl dribble SF10 embl dribble SHS embl dribble 6113 embl dribble chr2v4 embl 7333333334334434 3333333333 1333333333333343 44 19 112 ee a 2 8 x a BL 1044472 ot en plea L 11 35 RTS pellular mappia Mil el uno Lee Od molecular funsti5h waxkmsen i Hbi grocams GOrDODODCA process urkncwr r date 20020907 PS rie process 0210000004 JSNP rodan 781 741 ringles rusleotide
115. 80 90 100 done Snallest 9 Sum y Migh Probability DB CDS Tb09 1 60 3850 DB Sequences producing High scoring Segment Pairs Score P N N 1509 160 3850 20616 235 actin like protein Trypanosoma 970 5 3e 99 1 Os he period tme each and GC md wert lit de do genes We ha rd o ence n aad coral t d T509 211 0630 actin Trypanosoma brucei chr 9 Manual 493 1 9e 48 1 vos on codd pos plam i dodd d be appropri 1009 211 0620 actin Trypanosoma brucei chr 9 Manual 493 1 9e 40 1 ud Th10 61 0500 actin related protein 2 ARP2 putative 420 1 0e 40 1 p TEES is f mmm 5 j Cua ay TRIP 1 1136401 pil 30 Actin predicted by automatic 308 9 2e 34 2 Pts TRYP xi 1013a02 q1k 152 Probahle actin related protein 243 2 9e 29 3 E 109 160 3960 28616 280 actin putative Trypanosoma bru 179 2 9 13 2 704 5E12 570 actin putative Trypanosoma brucei chr 4 99 0 0016 2 mM Kw o 1b10 70 5830 actin related protein putative Trypanoso 72 0 084 2 E FA bem 1903 2765 490 hypothetical protein Trypanosoma brucei 68 0 18 2 ipana 15927 2 5100 30024 30 hypothetical protein Trypanosoma 70 0 69 1 Prev 1d TRYP xi 1002h02 p2ka529 6 Chondroitinase AC precursor
116. 9 3 4473 0 00 00 giv UNIPROT CH60 57 60 kDa chaperonin Protein Cpn60 groEL protein 99 6 1993 4473 0 00 00 gly UNIPROT CH60_SHIFL 60 kDa chaperonin Protein Cpn60 groEL protein 547 99 6 993 4473 0 00 00 1017 UNIPROT CH60 ECOL6 60 kDa chaperonin Protein Cpn60 groEL protein 547 99 6 99 3 4473 0 00e 00 uy UNIPROT Q6UDBO ECOLI GroEL Fragment 548 99 5 99 3 4471 0 00 00 12 UNIPROT Q6UDBS ECOLI GroEL Fragment 548 99 3 99 2 4470 0 00 00 EMT TET 0548V0 ECOLIJ Basic UniProtKB Entry Viewer UniProt the Universal Resource Mozilla Eile Edit View Go Bookmarks Tools Window Help Home gt atabase gt UniProtKB Entry View Text Search UniProt Knowledgebase E About UniProt Getting Started Sexrches Tools Databases Support Documentation the universal protein knowledgebase Home Text Search E Basic UniProtKB Entry Viewer Power Search Warehouse Li ProteinQ548M0 New Query Submit Annotation Dow nload Protein Bookmark Protein InterPro Search CluSTr Search Entry List Search Basic Extended Viewers Fasta Flat File XML ExPASy SRS PIR Data Set Manager BLAST FAQ Help Desk Download General information about the UniProtKB TrEMBL entry Entry name 0548 0 ECOLI Primary accession n
117. ACTGATACTTCCGGCTAACAGCATTT 10 20 30 40 50 60 70 80 90 CREER CNN CNN NN 8 Ww d Lov R V B5 V A W d v m tRNA 620 683 possible truncated tRNA Phe misc feature 620 134181 c The major Vi antigen pathogenicity island SPI 7 CDS 761 1955 Weakly similar to the C terminus of several polysaccharide bi CDS 1792 3156 Similar to Bacteriophage P1 Ban helicase 080281 EMBL AJO11 m nisc feature 2422 2445 PS00017 ATP GTP binding site motif 1 CDS 3149 4948 no significant database hits CDS 5117 5422 Doubtful CDS CDS 5550 6131 no significant database hits CDS 6216 6773 Weakly similar to Yersinia pestis orf 77 092381 031 CDS 7018 8361 no significant database hits 20 Workshop on Bacterial Genomics Module 1 Artemis Note that the two entries on the grey Entry line are now denoted name they represent the same information in the same order as the original Artemis window but simply have no assigned name Because the sub sequence 15 now viewed in a new Artemis session this prevents the original files from being over written 1 0 S typhi dna and S typhi tab We will now save them as new files to avoid confusion So click on the File menu then Save an
118. AGR AQROOLDNRA SVHVGGYATY AHADGWFLEP KLAMPUTFHA 124 TGCTGCTGGT GCCTGGTGGG CCGGTCCCCA CGCAACCGCC GGGTGGGCCT GCGAGTTGCG GCGCCGACCA ATACGCGCGG TCGGGGGCTA CCAGCCGCCT AGTACCGCAC ACGGCTGGTT ACCGCGCGGC TGGGCCTGGA TCAAGGCCAG CGCACCGCAC TGGGCCGCGG TGCCGTGGAC AAPLRRTTL TIKVSGRQAQ ANVSDTRDDD LOPLQPEDLP GAIVHLORAT EAPOLGAAIR RALLYRVLPE SLSIDNATWV LGLSDELVVM YRYRLAANGN ELSAAANAAV GRRFDOKVAG IANSGFYLDA QAELAVFRVG EFDGAGTVRT GYRYSW GCAGACGCCA TATCGGTACC CGCGAAGGCG GCCGCCGCAG GGCGGGCAGG GGCCAGCACG CCTGAATCCG CAACCGCGCC CGCGGTGGCG CGACCGCGGC TGCCACCTAT CGAAAATGAC CCATGGGGTA CCTCGAGCCG CAATGGCCTG GGTCGGCA AG CGTGTTGCAG CGAACTGCGC CCACAGCCTG CTTCCACGCG MALGALGALG GVLLENPAAE GIALYVAGEO PSRVVLGDTS IRRGDAPAGG AGRGARVTVS PYKLTLAGGA MTDNSNVGAL RDASGOHRLW GQUSLVGAKA NTGGVGLAST FELGADHAVA TLRASRLEND GGAYRAANGL NGIAHRTELR CGAGGCAGCG TACCGCTATC CCGCCGGCGC CCGCCGCAGC GAGTTGTCCG CTCTGGTACG GACGCCGGCG GGGCGGCGCT GTGGCCGGCG TTTACCGGCG ATCGCCAACA TTCAAGGTGG GGCGCCTCGC CGGGTGCGCG CGCATCGAAC GAGTTCGACG GGCACGCGCG TATGCCTCGT GGCTACCGGT AAPAAHADWN LRFQNGSVTS AQASTADSTL VTAVPASGAP AVPGGAVPGG GGSLSAPHGN QGQGDIVATE RLASDGSVDF VRNSGSEPAS LWYAESNALS VAGGRWHLGG FKVAGSDGY A RVRDEGGSSV GTRAELGLGM CGGCGACCTT GATTGGCCGC CCAAGCCCGC CGCCACAGAG CCGCCGCCA A GCGCT
119. AT Amino acid motif 80 90 GA E sh Forward Strand Features Reverse Strand Features zr E Selec View Close 20 Ioi C Sor Vi antigen pathogen island PI 7 761 179 eakly similar to the C terminMs of several polysaccharide 1792 acht Similar to Bacteriophage Pl Ba helicase TR O80281 Lag 2445 PS00017 ATP GTP binding site 1 49 4948 no significant database hits oix 621 File Select View Goto Edit Write Run 701 cps 65082 65459 no significant database hits CDS 65546 65764 c Similar to Escherichia coli prophage P2 CDS 65832 66932 c Similar to Bacteriophage P2 late gene cc 66929 67414 c Similar to Bacteriophage P2 comp 67414 70194 Similar to Bacteriophage 186 G 70187 70306 Similar to Bacteriophage 186 Orf52 H TR 70321 70623 c Similar to Bacteriophage P2 complete ger 70678 71193 Similar to Bacteriophage P2 major tail t 71203 72375 Similar to Bacteriophage P2 major tail T2910 13634 Similar to Salmonella typhimurium invasi 73830 74237 c Similar to Bacteriophage P2 probable tai 74244 75863 c Similar to Bacteriophage P2 probable tai 75860 76465 c Similar to Bacteriophage P2 tail proteir CDS 76458 77366 Similar to Bacteriophage P2 baseplate CDS 77353 77712 Similar to Bacteriophage P2 baseplate CDS 77709 78287 c Similar to Bacteriophage P2 baseplate as CDS 78356 78802 c Similar to Bacteriophage P2 tail complet CDS 78795 79226
120. CAS Th10 406 0320 2 5 complex 16kDa subunit putative ARP2 S complex subunit putative T brucei CDs Tb927 2 2900 ARP2 3 complex subunit putative 1008 250 Exercise 1 3 Browsing the Pfam domain catalogue 111 Protein families database of alignments ond Pfam Pfam Page and hidden Marcos model covering mary common protein If you haven t made a note of the Pfam domains then you could inii sn either go back to the S pombe dataset using the navigation bar alternatively use the Pfam site at http www sanger ac uk Software Pfam to retrieve the domain information by typing Arp2 3 into the search box hl 18D 7 proton sequences af lest match bo Pham The number 5 called he seguence Coverage ar ense in tha pie chart on the right nter you key wW dis h hera dabas of two partt Phe rst rhe part of PEm containing over 7973 protoni farle gue compreenda coverage cf we generate supplement called Pfam id This los ol and ene takan o the PROD hat do nat overtan sth Pm a of Quale Pima amies can be whan no Pm fares ane found Protein families database of alignments and HMMs Query r Rae text Results for query Arp2 Matches to documentation in the selected databases with links back to Pfam P34 Arc Arp2 3 complex 34 kD subunit p34 Arc P21 A
121. Codon Ee OE t i 85 AO Extend to Next Stop Codon is n i sr Fix Stop Codons E R eee B BUM R R 8 AV L O O 8 T A E H rom Y NO R R B E De gt ee ee ee case SS SS SS Se eS per eS eS peg Jem SS eS eS our x 8 I m Automatically Create Gene Names Fix Gene Names Reverse And Complement aise Selected a Add ee FN File Artemis tntry Edit File Entries Se Nothing selegted Entry no name ELE HELLE STY4528 M ILL C 93H ICs eA 1 EHE IT IP 1 mi ian Ti CA M 8 4525 STY4 b misc feature 800 1600 2400 3200 4000 4800 5600 6400 7200 lt tRNA MET EE mortuo pp MEIN LE EH LE OE WS E OHH EO P I EO EMO NH LEE E E LIH SE I WX WHR S mu 57 H GOON AGNOS 55 GR m MA DON SE E AAAGCAGGACGCTGCACTGGCATTATCGTCTGTCCATACCGGACTGACTGTAGCGGACAACAGAACAAT
122. E1 4NS ACCESS the database To SEARCH for Information on Enzymes on the Database CLICE HERE This page contains general information on enzyme nomenclature It mcludes links to mdradual documents and the number of these will increase as more sections of the enzyme list are revised It also provides advice on how to suggest new enzymes for listing or correction of existing entries Historical Introduction In Exzvme Momenciature 1392 there was an historical introduction This web version is slightly edited from that in the book 62 Workshop on Bacterial Genomics Module 5 Genome Resources Enter search words All enzymes Match 1 and return detailed results Each enzyme 1 represented by a separate web page in IUBMB d EC 11 1 6 Microsoft Internet Explorer IUBME Enzyme Nomenclature EC 1 11 1 6 The most commonly used or official name 15 used first Common name catalase Reaction 2 HO 0 2 HO Other name s equilase caperase optidase catalase peroxidase CAT Systematic name hydrogen peroxidehydrogen peroxide oxidoreductase Comments hemoprotem This enzyme can also act as a peroxidase EC 1 11 1 7 perosidase for which several organic substances especially ethanol can act as a hydrogen donor manganese protein contamine Mnl in the re sting state which also belongs here is often caled pseudocatalase Enzymes from some microorganisms such as Penicilin simplic
123. ETSSR GCBSEERLCATYTTYEECEEEFIFAFSYETTTYZPFLZULBNEFETAISIMSBS 1 lao le be leo DRCOG EGG CETETTETTEIE aie rere oe T T EF F T P FLT SL SESTLEFTEGELTTHEHLDRLDLH amp EHMHLFAS RDNBHERETSFLLREEADELCRELTLUL EE T WU E F L C tC H Rh C V E T T 5 V EL H L b U 2 NH H T T T 1 5 3 ER NH K amp N H 1 c c B Y amp 1 L OL RJ 0 ri FLA CTECTCOCACCOCALACTAACART 1 Le Variatia Hajoritg of BCR TEMEA and reads call One WEF zalis C VAL atio MEI Bajeritg of BCS TEMEA and beads 6421 Dee ceed calls C Tariation Bes Bes Bajaritgy of MCS TFBITZAZD and call C Dra BCS and ona TT20 12 cead call T Tari tion Bat Bet Bajacity of MCS TFBITZAZO and TFTEA reads call L Dra ECS and ona TT2O0 12 cead call C VAL ALIO su o BCA THA amd TRIE ede pall C Dre VEJ and ene call Balaton Waisrivy of BOS TRE and call aed 2972 114 bp T riabieon LDAP 104 Bajeritgy e WCS TFA TFDS and TPBLIAZD call One WES and TPIDAII call T variatia 1047 af GCS TAEA Tes TPE
124. Edit View Go Bookmarks Tools Window Help Q Q lt http www genome jp kegg tool search pathway html a Q Search a 4 Home Bookmarks Google S Library 5 SSC SSC dev S S PSORT Pfam S SRS7 FTP lt E coli S GeneDB amp Entrez PubMed Search Objects in KEGG Pathways Search against Reference pathway Enter objects Examples for Reference pathway 5 3 1 1 cpd C00111 cpd C00118 1 2 1 12 C00236 Examples for Homo sapiens pathway hsa 7167 ec 2 7 1 11 4 00118 ALDOA 1 2 1 12 C00236 Alternatively enter the file name containing the data 2 Browse Li Display objects NOT found in the search Jaap Pathway Search Result map00010 Glycolysis Gluconeogenesis 2 7 1 1 hexokinase hexokinase e IV glucokinase hexokinase D hexokinase type IV hexokinase phosphorylating ATP dependent map00051 Fructose and mannose metabolism H2 133 1 hexokinase hexokinase type IV glucokinase hexokinase D hexokinase type IV hexokinase phosphorylating ATP dependent EC 2 71 80 diphosphate fructose 6 phosphate 1 phosphotransferase 6 phosphofructokinase pyrophosphate pyrophosphate fructose 6 pho B 2 7 7 18 mannose l phosphate guanylyltransferase GTP mannose 1l phosphate guanylyltransferase PIM GMP phosphomannose isomerase gu EC 4 2 1 47 GDP mannose 4 6 dehydratase guanosine 5 diphosphate D mannose oxidoreductase guanosine diphosphomannose oxido
125. Exercise 1 Module 5 Genome Resources Match amigo go cgi action query amp view query amp session id 2651b11 GO Term GO ID Ontology Definition Comment Synonym GO 0003678 F Catalysis ofthe Consider also gt hydrolysis of annotating to to unwind the molecular Submit Query the DNA helix at function term the replication DNA binding fork allowing GO 0003677 the resulting single strands to be copied ATP dependent DNA helicase activity F Catalysis ofthe Consider also A fulgidus ha reaction annotating to H20 ADP the molecular phosphate function term SGD zi driving the ATP binding MS unwinding of the GO 0005524 DNA helix 7 DNA helicase IV activity GO 0008722 E single stranded DNA dependent GO 0017116 Catalysis ofthe Consider also E ATP dependent activity reaction ATP annotating to 20 ADP the molecular ry DNA helicase osphate in function term theWwresence of ATP binding no GO 0005524 Target Terms Fields m DNA helix Ein single stranded DNA dependent GO 0017117 c ATP dependent DNA helicase 3 to5 DNA helicase activity GO 0043138 F Catalysis of the unwinding of the DNA helix in the direction 3 to 5103 DNA helicase activity GO 0043139 F Catalysis of the E unwinding of the DNA helix in the direction 5 to ATP depende
126. Full Sequence Full Sequence Full Sequence Summary for T brucei predicted protems wublastp for query SPAC630 03 Full BLAST Search Name Name Name Name Name TROT 160 3850 Score 970 Ib05 211 0630 Score 493 Ib09 211 0620 Score 493 Ib10 61 0500 Score 420 I511 01 1870 Score 308 EN EN EN PIN PIN 3 0 99 1 0 48 1 0 48 5 7e 4 5 1e 34 102 Sequence Sequence Full Sequence Full Sequence Full Sequence The Wellcom Trust xc Sanger Institute gt Pathogen Sequencing Unit e Help Workshop on Bacterial Genomics GeneDB 0008 m Retrieve result for id 19200197 07065820000524 BLAST results are kept on our servers for three days folowing query submission Results may be retrieved any number of mes during this After this time queries must be resubmitted further exanunation is required Low complexity filtering disabled Repeatmasker disabled BLASTP 2 0MP WashU 16 Sep 2002 ecunix4 0 ev6 I32LPF64 2002 09 18719 28 12 Copyright C 1996 2002 Washington University Saint Louis Missouri USA Rights Reserved Reference Gish W 1996 2002 http blast wustl edu Query SPAC630 03 427 letters Database TRYP1910 prots 16 376 sequences 5 322 143 total letters 10 20 30 40 50 60 70
127. GAACAATZ 10 20 30 40 50 60 TTTCGTCCTGCGACGTGACCGTAATAGCAGACAGGTATGGCCTGACTGACATCGCCTGTTGTCTTGTTA EU DOW d D N E vl a tRNA 620 683 possible truncated tRNA Phe misc feature 620 134181 c The major Vi antigen pathogenicity CDS 761 12795 Weakly similar to the C terminus oj CDS 1792 3156 Similar to Bacteriophage P1 Ban misc feature 2422 2445 PS00017 ATP GTP binding site motif CDS 3149 4948 no significant database hits CDS 5117 5422 Doubtful CDS CDS 5550 6131 no significant database hits CDS 6216 6773 Weakly similar to Yersinia pestis CDS 7018 8361 no significant database hits This will create two files one with the sequence and the other with the annotation in the directory within which you started Artemis To create a complete EMBL file use the UNIX you covered earlier and cat the files together 24 Workshop on Bacterial Genomics Module 1 Artemis Artemis Exercise 2 This exercise will look at a section of the Malaria genome You will need to close down the last Artemis exercise if you haven t already done so Then start a new Artemis Session as before using the file Malaria embl in the current directory Module 2 Artemis Unlike the Salmonella exercise in this instance the annotation and sequence are contained within the same file Malaria embl
128. Information Name Systematic Name BB0916 Julian Parkhill Product autotransporter 12 others Mohammed Sebatha Type CDS Contact the developers Sequence DNA and Protein Location Example genes Chromosome 1 Contig BB bronchiseptica project page at the Sanger Institute Location B bronchiseptica genome by FTP from the Sanger I Exons complement 984031 988248 Length 4218 bp complement 984031 988248 Spliced length 4218 bp Graphical Display in Artemis Context Map 975000 980000 985000 990000 995000 BC Hosted by the Sanger Institute BO Ce 0910 BB0911 BB0912 0913 BB0914 BB0915 gt 0916 lt 0917 BB0918 BB0919 BB0920 BB0921 BB0922 0923 BB0924 0925 7 Primary Annotation Workshop on Bacterial Genomics Identifying monosaccharide transporters within the genomes of P falciparum P Berghei and P chabaudi The following exercises aim to introduce you to the features that allow quick and convenient data mining from GeneDB and will equip you with the tools to use the database to facilitate your own research Hopefully they will also make you aware of its strengths and limitations and highlight the advantage of using several search strategies The aim 15 to identify monosaccharide transporters in Plasmodium falciparum Plasmodium berghei and Plasmodium chabaudi Glucose transporters are promising drug targets as asexual stage parasites depend heavily upon glucose for energy Joet et al
129. L I E N Y I 8 5 T T 5 F G M 5 ees Soe ta Sy Oi JO TTGCAAGAGATCATAACAGGOG AAATTGATTGAAAATAAATATATCOCCAG CAOGCACATGAACAAG TTTCOG AATO TU AN 20 90 Feature Arrows Feature Borders 4 Bll Features On Frame Lines Menu item for de selecting stop codons 4 Show Source Features Flip Display 4 Colourise Bases No stop codons shown on frame lines You will also need to temporarily remove all of the annotated features from the Artemis display window In fact if you leave them on which you can they would be too small to see when you zoomed out to display the entire genome To remove the annotation click on the S typhi tab entry button on the grey entry line of the Artemis window shown above Your Artemis window should now look similar to the one shown below 14 Workshop on Bacterial Genomics Module 1 Artemis Graph scaling menu Slider for i ub ae EA le 4 t 142 Scaling md Nes 8 18 Maximum Window Sizet N i AN 100 S es 2 wal EART 2 200 1000 a Greaaog prssoon 2 92200 baog zooming out Plot Options 10000 Ld OIC B CER Fo Po Bb ew ko 1 UE DN PR A HE O Y B E C Dg 50000 100000 vi
130. L O Y Gu 35 G hu Re S b Lb he ER dc NR de ud OM ID gt 5 Use one of the methods you have already used to take you to the second region of interest that you noted down Region two acts as a cautionary note when looking at anomalous regions within a genome Have a look at the CDSs within this region Does this region any of the characteristics of pathogenicity island are the genes within this region essential or dispensable Is it possible that the atypical base composition of this region is not a consequence of having originated from a foreign host The base composition may actually be reflective of the tight sequence constraints under which this region has been maintained in contrast to the background level sequence variation in the rest of the genome Workshop on Bacterial Genomics Module 1 Artemis Region 3 dms 509 STY 5114535 pilK 5114564 4574 5714588 4599 STY4628 STY4648 STY E ecnA 4 5 ni E gt Pooh be melB dms 8714518 8714534 1 ST 311456 3114578 TY4591 95 3114627 4645 3114663 67 672 P E BPP D 1 8 448 sTY4497 8 STY4517 1524 pile 11 ST 3174565 STY 314590 34 3714647 3114664 nr rn Poo Bb gt Dbbbb b bb b p 1243 misc feature ature f misc feature eature it ature RBS misc featu sc featur misc f Eo
131. LIADO reads call T Une VCS and TRIOLJ2 mall G toT of MOS TRIES TREES reads 4 One WCS and Majority ad TALAL O and call na and call wariati n bai iai Bajarity of and reads call WES and TPIDAI cabi War ako 1347 Bajeritgy of WEA TRO Teds ana TRTEA reads call T Tae TFIOAI2 seeds call Vali ata VEM LPM RHajsrity and pesds cail C TALS call T WADIAD1ON Lens bas Bajarity of BES TRIES and reads call R Dre TRAE read calls varjatiom of TRTEA amd pewds call G d variation ieri Hajerity of BCE TRS and TR LSIAZS reads call 0 One THOIS sails B esee unit 2971 e 132 bp repeat Tariation 1 of 12042 and reads coll TRIS read ralis E Sas 19 Me gene predicred by giimaer variatii Shih Haisrivy of WCA TEMEA TRS and TFBLIADG call One WEJ ceed calls C i TMM C ariabion THH Bajaritgy of MCS TF7I4 and redda call TRIGALI2 call TUB 313 c unlikely gene 111 Workshop on Bacterial Genomics 6 Directory Systematic id 10927 1 700 Location complement 232
132. N Data Example genes News archive Data rel Trypanosomatids genomes and biology CD FTP download Sanger 15th July 2005 FTP download TIGR Tritryp genomes published in Science Help Feedback Curator Technical 22nd June 2005 non coding annotation revised protein feature domain predictions updates Sanger brucei project TIGR 7 brucei project TIGR congolense project T vivax project muri updated 7 vivax data GeneDB gambiense project T congolense and T b gambiense data T brucei Genome Network Biological resources available via GeneDB gt The Tress DB Pfam List Sanger limstitute a Pathog ers tgene lil GoTo Genet Hele ienis 110 100 o 983 render shown Previous This list takes poa bo the Bret entry that starts miih the selected leper and shows you the ond 100 alphabetical entries He 100 LUABRCDEFGHIIELMHOPORSTUVWIXYZz e Y nit dna DESIT 2 dh tem PROBS T exonbonuciesre Tesi domam 1 PF01138 3 fs 3 5 exonuclease PROG 20 TU enr 2200233 2 8 3 besa zsserotd dehydrogezasefisomerare 1015 1 8 dehipicogenade terol dose FEQUTZS 1 debpdropenase HAD bending doman FFU2737 1 fh 4 debgpdrogena
133. O D Mannitol D Mannitol extracellular D Fructose 2P D Fructose 2 6P2 3 1 3 54 O D Mannos 2 7 1 69 extrace lular CLE 0 42 7 1 69 D Fructose extracellular Edd Glycolysis D Mannose P seis ADPmannose TN Aminosugars D Sorbitol 6P 3131 27441 27713 2 7 1 90 17 maxnuronate Alginate A ni 4223 e 2 4 1 33 O O L Sorbose 1P Dimer D Sorbitol 2 7 1 69 D Fructose 1 5 2 L Fucose 2 2 1 52 2 7 7 30 111271 L Sorbose 2 L Fucose 1P GDP L fucose GDP 4 0xo 6 extrace deoxy D mannose 2 3 D Lactaldehvie D fuconate D Fuconate L Rhamno L Lactaldehyde L Rhamnonate 1 4 lactone L Rhamnofuranose 2 7 1 51 HO 4 1 2 17 Ont 4 2 1 90 HOSH 3 1 1 65 HOH 111173 HO L Fuculoz L Fuculox 1P 2 Dehwiro 3 deoxy L rhammnonate LL L Rhamnulose 4 m Oy EJ Hint it joins the two pathways 74 Workshop on Bacterial Genomics Module 5 Genome Resources Gene Ontology GO Section 2 The official browser for GO annotations is Amigo Web address http www godatabase org all all 219618 Graphical View GO 0008150 biological process 145845 30905575 cellular component 131834 GO 0003574 molecular function 152664 9 obsolete biological macess obsolete biological process bt obsolete cellular compone nt obsolete cellular component
134. O flat file Get a bookmarkable url of this tree 277 Workshop on Bacterial Genomics Module 5 Genome Resources 78 Workshop on Bacterial Genomics Module 5 Genome Resources Section 3 InterPro amp UniProt Web address http www ebi ac uk services EBI Services Mozilla PPO PALL L PAPA ll Elle Edit View Go Bookmarks Tools Window Help 9 Q fb http www ebi ac uk serviees Sen sh lt a f 5 Bookmarks S Library SSC SSC dev S S PSORT Pfam S SRS7 FTP E coli S GeneDB Entrez PubMed Site Database Map Queries Databases Downloads Submissions VIEW ALL EBI SERVICES Databases Database Browsing amp M Entry Retrieval via M BioMart NEW oolbox Toolbox ChEBI NEW Similarity amp Homology EMBL SVA Blast2 ASD NEW Fetch Tools Blast2 EVEC Integr8 NEW Blast2 NCBI Blast2 Parasite n ArrayExpress Submissions Blast2 WU SRS3D AEdb xim UniProt DAS NEW ArrayExpress via ELM AT UniProt Search NEW MIAMExpress Fasta LGIC NEW WSDbfetch Geno Proteo Services Downloads MPsrch Literature Databases L EBI FTP Se more MEDLINE PDB AutoDep d OMIM Help Files UniProt via SPIN Prot Function Analysis Database Repository Webin Align CluSTr om Software Repository GeneQuiz K InterProScan Microarra
135. PAAHADWHNQSIIKAGERQHGIHIKQSD 60 Memo MEAN ea E 1 MHMSLSRIVKAAPLRRTTLAMALGALGA APAA ADWHNQSIIKAGERQHGIHIKQSD RM d Shjct 1 MHMSLSRIVKAAPLRRTTLAMAL GAL GA APAAYADWHHQSIIKAGERQHGIHIKQSD 57 Query 61 GAGVRTATGTTIKVSGRQAQGVLLEHPAAELRFQNGSVTSSGQLFDEGVRRFLGTVTVEA 120 GAGVRTATGTTIKVSGRQAQGVLLEHPAAELRF QNGSVTSSGOLFDEGVRRFLGTVTVKA Sbjct 58 GAGVRTATGTTIKVSGRQAQGVLLENPAAELRFQNGSVTSSGOLFDEGVRRFLGTVTVKA 117 Query 121 GKLVADHATLANVSDTRDDDGIALYVAGEQAQASIADSTLQGAGGVRVERGANVTVQRST 180 GKLVADHATLAHVSDTRDDDGIALYVAGEQAQASTADSTLQGAGGVRVERGANVTVQRST Sbjct 118 GKLVADHATLANVSDTRDDDGIAL YVAGEQAQAS IADSTLOQGAGGVRVERGANVTVQRST 177 Query 181 IVDGGLHIGTLQPLOPEDLPPSRVVLGDTSVTAVPASGAPAAVSVFGANELTVDGGHITG 240 q SS IVDGGLHIGTLQPLQPEDLPPSRVVLGDTSVTAVPASGAPAAV VFGANELTVDGGHITG Sbjct 178 IVDGGLHIGTLQPLQPEDLPPSRVVLGDTSVTAVPASGAPAAVFVFGANELTVDGGHITG 237 Query 241 GRAAGVAAMDGAIVHLQRATIRRGDAPAGGAVPGGAVPGGAVPGGFGPLLDGWYGVDVSD 300 GRAAGVAAMDGAIVHLQRATIRRGDAPAGGAVP GGAVP GGAVP GGF GPLLD GWY GVDVSD Shjct 238 GRAAGVAAMDGAIVHLQRATIRRGDAPAGGAVPGGAVPGGAVPGGFGPLLDGWYGVDVSD 297 Query 301 STVDLAQSIVEAPQLGAAIRAGRGARVTVSGGSLSAPHGHVIETGGGARRFPPPASPLSI 360 STVDLAQSIVEAPQLGAAIRAGRGARVTVSGGSLSAPHGHVIETGGGARRFPPPASPLSI Sbjct 298 STVDLAQSIVEAP QLGAATRAGRGARVTVSGGSLSAPHGHVIETGGGARRFPPPASPLSI 357 Query 361 TLQAGARAQGRALLYRVLPEPVKLTLAGGAQGQGDIVATELPPIPGASSGPLDVALASQA 420 TLOAGARAQGRALLYRVLPEPVKLTLAGGAQGQGDIVATELPPIPGAS
136. PO Primary Annotation Predicted Peptide Properties Mass 99 5 kDa Amino acids 931 Isoelectric paint pH55 Charge 295 Signal Peptide Not found Transmembrane Donsains 0 found Protein Map N G ibo 200 3 4 400 600 700 8 0 Domain Information DE Aces Description Note Pfam 2200082 Subtlase famdy HMMPiam ht to PF00082 Subslase family score 1 9e 10 Orthologues DE Aces Deseription Note GeneDB_Bpertusas 0216 aitotraesporter rubebrn like protease 122 Workshop on Bacterial Genomics CDS BP0216 DB zu E sibi gingen ritiene penne pr 1 EF 211801 Length 27 bp 233401 226359 Genia ERGI ERGS EPCS MEOS Previously sequesced aa Bordetella peotease SAE l TE CACHI EMEL ATTLE2ZS 1099 aa faeta scores EJJ 0 200 00098 d m VEZ to X ella protease 2 AEOU 18390 0905 aa arta morer ED 1 14 51 519553 m Predicted Peptide Properties Man tila acids D Teseleriric point 103 Charge mo Signal Pegisde Tot band ane T omamr Ohad DE ere Ero THMP bat Sobre Daly m ts POE they PEOSTTE C mm
137. Proteins predicted to have between 8 and 14 transmembrane domains intersect Proteins 5 10 42 lt second view Download 42 predicted to have GO process transport PM Proteins with a product containing the keyword or phrase transporter intersect Proteins containing a predicted signal peptide dH lt l second view Download 303 Genes for pberghei pchabaudi malaria intersect Proteins with a product containing the keyword or phrase transporter intersect Proteins 5 12 06 US containing predicted signal peptide xod view Download 1 Genes for pberghei pchabaudi malaria intersect Proteins with a product containing the keyword or phrase transporter intersect Proteins 5 29 52 predicted to have GO component membrane intersect Genes for pberghei pchabaudi malaria intersect Proteins predicted to have between PM lt I second view Download 19 8 and 14 transmembrane domains intersect Proteins predicted to have GO process transport Follow one of the links in the table above to view a query result set or select two of the result sets and use one of the following buttons to combine them Union will create a2 set that contains all the genes in either of the selected sets Intersect will create a result set that contains only the genes both of the selected sets Subtract will remove any genes the secoad Ge appearing list set from the firs UNION Jo
138. REF GO rui wi SWISS PROTOUS509 loben IEA GOCmperpeozgo 126 contistuest of eytaskeleton 155 TIGR_Thal Th927 2 2900 TIGR ERE CO rel us SWISS PEMTUUEEMB H others Literature Search for genelpeoten amp PubMed Database Cross References DE Descnipoon TOAD T5927 2 2500 TIGR braced Annotation Database TOGAD rthalagues DE Acts Description Note 00 0600 complex subunit putative predicted by cog chustering Teus 1600 1053509127 I iiim aii i predicted by jaccard cog Tedd 1047053508737 predicted by picard cog AB of 19 shown Esport T brucei 2 13 620 16 autoantigen cns 28613630 1 astoanhgen ME DUE 4 T ruc CDs 150 3960 aetin Eke protean putative 28316 280 ehe3eon amp g 160 305 ebrelete Chr3 1160 contig 19 19 T brucei CDs 2110620 actin chrcontg 1 limpie cps 21105630 actin chr consig2 1 L tmpO079 T brucei CDS TiO 610500 acBn relxied protein 2 putwave TRYP x Glh i q2kbla5 7H T bruni che 10 70 4830 protean putative TRYP x 70206 p2kb 545 795 T bruce ena TbilO 1470 protesn putative T brucei eps 1511 02 1380 TRYPAREP actn ke protei putaave Tui CDS 1052712330 1592712340
139. SGPLDVALASQA Sbjct 358 TLQAGARAQGRALLYRVLPEPVKLTLAGGAQGQGDIVATELPPIPGASSGPLDVALASQA 417 Query 421 RWTGATRAVDSLSIDHATWVMTDHSHVGALRLASDGSVDFQQPAEAGRFKVLMVDTLAGS 480 RWTGATRAVDSLSIDHATWVMTDHSHVGALRLASDGSVDFQQPAEAGRFKVLMVDTLAGS Shjct 418 RWTGATRAVDSLSIDHATWVMTDHSHVGALRLASDGSVDFQQPAEAGRFKVLMVDTLAGS 477 DB Bordetella pertussis GeneDB DE This page provides access to the annotation and sequence of B pertussis stram Tohama I This is the result of a collaboration of the Sanger Institute with Duncan Maskell and Andrew Preston of the Centre for Veterinary Science Dept of Clinical Veterinary medicine The University of Cambridge This data described in Parkhill e al 2003 Comparative analysis of the genome sequences of Bordetella pertussis Bordetella parapertussis and Bordetella bronchiseptica Nature Genetics DOT 10 1038 Ng1227 PDF version of this article Database Entry Point ui by ID description hos DB CDS BP1054 Include description Searches Ana oereDB M Add wildcards omniBLAS BLAST List Downle Seorchicr GoTo Organisms E 5 DER Cross Organism Se E Complex Boolea General Information Ma teBasaet esf view Basket Full Content Search Mame Systematic Marie BIO Gene Synonyms ma lroduet p rtactm precurgar e Julian Parkhill Project manager Sequence and Protein Mohammed Sebaiha Curation Location Chromasamie 1 Contact the developers Cont
140. ST Browse Catalogues BLAST Riley List Download Products Cross Organism Search Page Pfam Complex Boolean Query Full Content Search Information Julian Parkhill Project manager Mohammed Sebatha Curation Contact the developers Miscellaneous Information Example genes BPP2408 rpsR rplI B parapertussis project page at the Sanger Institute parapertussis genome by from the Sanger Institute 4 1 1 inner membrane Zoo 4 1 2 Murem sacculus peptidoglycan 35 4 1 5 Outer membrane constituents 26 4 1 4 Surface polysacchandes amp antigens 38 4 1 5 Surface structures 9 4 2 2 Ribosomal proteins synthesis modification 53 4 2 3 Ribosomes maturation and modification 3 5 1 1 Cohcin related functions 2 5 1 2 Phage related functions and prophages 55 4 14 Transposon related Functions 260 5 1 5 Pathogerucity Islands determinants 54 8 1 1 Global regulatory functions 308 7 0 0 Not classified included putative assignments 386 128 Workshop on Bacterial Genomics Report Download pertussis CDS BP0216 autotransporter subtilisin like protease sphB 1 B pertussis CDS BP0529 autotransporter pertussis CDS BP0758 cyclolysin activating lysine acyltransferase cyaC HlyC B pertussis CDS BP0760 bifunctional hemolysin adenylate cyclase precursor cya pertussis CDS BP0761 cyclolysin secretion ATP binding protein B pertussis CDS BP0762 cyclo
141. STY 5 4410 STY4417 4421 STY STY442 gt Eb b Lb yhfA yheR lyD 1 rpsG rpsQ N 5 10 S yrdD 8 4401 sTy4406 STY STY4422 STyd4 Hb b Pp bbb b P feature misc misc feature re f misc feature ture misc 165 rRNA m misc feature misc featu m misc featur misc feature esc 4225000 4231500 4238000 Won 4257500 4264000 4270500 4283500 4296500 43 n misc misc feature feature misc feature misc feature misc feature SlyX yrd A STY4412 TY4423 HL qu Hl a qaa hopD mscL fms STY4404 ST 65 4411 3 4416 STY4425 qum yh cr yheU 35 yjaB 8 4414 8 4424 STY4 Ed Me ee iU ER D IBS Ree ACCAAGTCTCAGGAGTGAACACGTAATTCATTACGAAGTTTAATTCTTTGAGCATCAAACTTTT 4257430 4257440 4257450 4257460 4257470 4257480 4257490 4257500 4257510 4257520 4257530 4257540 425755C TGGTTCAGAGTCCTCACTTGTGCATTAAGTAATGCTTCAAATTAAGAAACTCGTAGTTTGAAAATTTAACTTCTCAAACTAGTACCGAGTCTAACTTGCGACCGCCGTCCGGATTGTGTACGTTCAGCT ESSE OBAT da GS a Ate ie i eT BR n uw BUNC e OB ESQ
142. Systematic id rganisme Range E UpTows Serii 10000 Downsreas open up in an Artemis applet default the sequence 10kb upstream and Daazload Tyges P Download Iratanct and downstream of your feature of interest Dowd nice m Futa format will be selected n Arteries Applet Hel foot Hei Java applet doe periodi rs or ore Heie beta Simoy Ret File Select Vies Drwph Displey malscrad 1 MINETT MIA HE P E HH GM TE NE HI dr gut E RE p d hoss S O 3 m pad ia ture amp i feature MIU 1 Ig o MI l tia sph ilysmrate Rinak ems tns 111 5 UBI Eur iycrrats hints 2 He UNI a y LE DUE RE spheglyeerare inasre 4 mat SEER LI FSESEIALEESSTEEGOQT 45 PREGEEEFTEGLUT Ls4eLAERSTSSLHSSEEQUSLLSIALAP g TI13 Y F4 B K Y 5S N EK YT b B D L E V B K V Q C 3 C 5 4 W B G 1 C YEN ES ks eh ee he ae d EIRG t B T3S5RI
143. TCT CCTCCTTTGA TCTTCAAAAG CTTGCTGGTC GATTCAAGTC GTGGACAACG GTCCTTTCTG AAGCGGATTO AGTTTACTAG 979 CCGGCTATAG TTATTGCGAC ACATGGCCTC TITTTATTGS TCCOTCATOG TTAAATATIT ATCCACCTGA GACTCTACAT TTACTGATCG TTCCTOTTGOC GTGATGTTAC TCAARACTGC AGTTTAGTAG CTOGACACTC TCGTACAGAG GTGGTAGTAC TTGACGAGCG CACAAACGCC CTICCATTOC DB CDS arp3 al d di p abide ERIT 5 we rtd i n ce probed coral 00 2 0 1 0 rb mein pede mimir nira ij EE i menu una Tisi ixi Id QNID Rede Spliced DNA CAAGCTTGGC GCGTTCTGCG GAAAGGCAGT AAACGATGCCA ACAAATAGAA ACGTTGTGAA BAACEGTGAA TTCGTTGACC CGAAGGATAT GTACTTTGTT TGAAAGAATT ATTTGATCOT TACTACTATT GATTGCTICT CTCACCCATT TTTGTTTAAG TATTCACCGC CTCCCATAAA TCOTAGOTAC gt 5 lt 630 03 arp3 acc2 ARP2 3 accin organizing complexiSchizosaccharomygces pombeichr 1111 Manual gt 5 lt 630 03 arp3 acc2 ARP2 3 accin organi
144. TGGGG GGCGCTGGCA ACGGCGGCGG GCGGTTTCTA CGGGCAGCGA TCGAGGCGGG TGGCGGTGTT ACGAAGGCGG TGGCAGGCGG GCGCGGGTAC CCGAACTGGG NOSIIKAGER SGOLFDEGVR QGAGGVRVER AAVSVFGANE AVPGGFGPLL VIETGGGARR LPPIPGASSG QQPAEAGRFK ANTMLLVQTP KRLGELRLNP LAGYTRGDRG VKGKYRTHGY LGRLGLEVGK AAALGRGHSL TACCCTTGCC CAACGGCAAT GCCGCAGCCC GCAGCCGGAL CGCGGCGGTC TGCGTTGTCC CCGCGGCTTC GGTGGCCGGC CCTGGGCGGG CCACACCGAC CCTGGACGCG TGGGTACGCG CCGGCGCTTC CCGGGTCGGC CAGCTCGGTG CAGGCAGGTG GGTACGCACC CCTGGGCATG CAAGGGCCCG QHGIHIKQSD GANVTVORST LTVDGGHITG DGWYGVDVSD FPPPASPLSI PLDVALASQA VLMVDTLAGS RGSAATFTLA DA GGAWGRGF FTGDGGGHTD GASLEAGRRF RIELAGGRQV YASYEYSEKGP Workshop on Bacterial Genomics The Welloome Trust DB GeneDB omniBLAST Server te Sanger Institute Pathogen Sequencing Unit Go Orgenisms Go To Sroncuts Help Cien AST unl perform a BLAST search on a set of praten databases BLASTP or BLASTX depending on the query sequence or nucleobde databases ELASTH and TELASTX or TELASTH available m Gene DE amd return a bet of the best fue ASP for each database you can check en Full Search to see the complete BLAST To search databases with different parameters 29 single BLAST Choose QUERY DATA Paste
145. THRC ECOLI Fasta hit to THRC E misc feature 4022 4066 800165 Serine threonine dehydratases yridoxal phosphate at CDS 5114 5887 Orthologue of E coli P TAAA ECOLI Fasta hit to YAAA E CDs 5966 7396 Similar to Bacillus subtilis amino acid carrier protein Aue misc feature 7138 PS00873 Sodium alanine s 316 family signature CDS 7665 8618 Fasta hit to TALA ECOLI 6 65 identity in 311 aa minn 1 2 Drop down menus There s lots in there so don t worry about them right now Shows what entries are currently loaded bottom line and gives details regarding the feature selected in the window below in this case gene STY 0003 top line This 1s the main sequence view panel The central 2 grey lines represent the forward top and reverse bottom DNA strands Above and below those are the 3 forward and 3 reverse reading frames Stop codons are marked as black vertical bars Genes and other features eg Pfam and Prosite matches are displayed as coloured boxes We will refer to genes as coding sequences or CDSs from now on This panel has a similar layout to the main panel but 15 zoomed in to show nucleotides and amino acids Double click on a gene in the main view to see the zoomed view of the start of that gene Note that both this and the main panel can be scrolled left and right 7 below zoomed in and out 6 below This panel lists the var
146. Workshop on Bacterial Genomics Module 2 Comparative Genomics Can you see conserved gene order between the 2 species Can you see any region where similarity is broken up Zoom in and look at some of the genes encoded within theses regions What are the predicted products of the genes assigned to these locations View the details by clicking on the feature and then select Edit selected feature from the Edit menu after selecting the appropriate CDS feature Can you identify any genes in one organism that don t appear to be predicted in the other If so add these to your annotation ACT Thrucei dna vs Leish dna File Entries Select View Goto Edit Create Write Run Graph Display AEE 175500 156000 136500 117000 97500 78000 58500 39000 1950 1 DD d 4 20 qd _ 4 21 4 H HOO 40 90 4 3D ID TH O02041 315381 315097 80532 80815 score 57 percent id 377 fi Subject Flipped PERTH BED E ED Br Pe MP MB PEE P NO 195000 34000 0000 aem 312000 n a 4 H d lt lt EA 4 4 2369 Lal H 4 43 Workshop on Bacterial Genomics Module 2 Comparative Genomics Exercise 4 Introduction The quinic acid gene cluster the qut cluster 15 present among many filamentous fungi including including Aspergillus fumigatus Neurospo
147. a scrollable window You could also view these results in your Netscape Browser window the previous exercise How does your predicted gene model for this CDS compare with proteins pulled out of the public databases Is it possible that there are additional exons not featured in the current model If you think that there are additional exons that should have been included in the gene model you should add them to it Using GC content and results from your database search as guides roughly draw in where you think the additional exon s lie To create additional exons Select the region you think represents the exon by holding down the left mouse button and dragging the curser over the region of interest Then click the Create menu and select Create feature from base range A new blue CDS feature will appear on the appropriate frame line See below 2 C lick Edit Li File Entries Select View Goto Edit Create Write Run Graph Display C LA Se LA Entry Malay emb 1 E Window size 120 Undo Ctrl U i Edit Selected Features Edit Subsequence and Features Edit Header OF Default Entry Change Qualifiers Of Selected Remove Qualifier Of Selected Y Duplicate Selected Features Ctrl D Merge Selected Features 3 Unmerge Selected Feature 3i LG BM T ML E MOIMI MH EMI 1 1 Sele
148. across many genomes including those which are not annotated and curated for GeneDB and 15 accessible via GeneDB It can be a powerful way to search for genes with similar function across several organisms in our case the search for transporters of glucose and hexose The example below shows how to set up this query One you ve tried it and have become familiar with it try some of the other suggested searches or perhaps one that would be of interest to your own research DB Plasmodium falciparum GeneDB DB Curad of the Pirewn kaga 37 ponome is avaiable from thin idemo versi of ther datahasr There a spars page for cach pene containing andermation 8 info eg predicted physical properties C annom on domain informaiton 5 Emu curated uas Database Entry Point earch dor grae Searches Analysis mmm AS a RE V Prom Senrch eS Cellar V Tacie v GO Molecular Fumin E Add EL Prodis Pam uil Content Search List Jam uz pens eke h azr TAM dera qe Compiles Trebiam Cheers 13 i J hn 1 cL Dor Due Litus arid pripet 15th March 2004 download Sanger download MEIR oce Feedissck Curator Technical Nena The Mamma
149. aser nomer x 10 Species Escherichia coli K 12 Clicking on the Gene Reaction Schematic M thumbnail map links to more detail Internet Exercise 3 Here are six EC numbers for proteins in the malaria genome annotation that have been assigned based on protein similarities Are they involved in a common pathway If so can you use KEGG to piece together the pathway and predict which gene is missing and therefore could remain unidentified in the Malaria genome The first EC number 1 for a fructose bisphosphate aldolase and you already have it see previous exercise The remaining known EC numbers are listed below 2 LL 5 4 2 8 25 2409 4 2 1 47 2 7 1 90 70 Workshop on Bacterial Genomics Module 5 Genome Resources 2 KEGG Kyoto Encyclopedia of Genes and Genomes Mozilla File Edit View Go Bookmarks Tools Window Help TOY Veo xxm lt lt DU a 4 amp Home Bookmarks S Google Library S SSC S SSC dev WIKI S PSORT S Pfam S SRS7 S FTP E coli DB S GeneDB lt Entrez PubMed E TN 9 4 1 k od 4 i KEGG Anniversary Symposium December 15 16 2005 Kyoto Japan KEGG Kyoto Encyclopedia of Genes and Genomes A grand challenge in the post genomic era is a complete computer representation of the cell the organism and the biosphere which will enable computational prediction of higher level complexity of cellular processes and organism
150. at 15 available to you Information to view Annotation If you click on a particular feature you can view the annotation attached to it select a CDS feature or any other feature and click on the Edit menu and select Edit Selected Feature A window will appear containing all the annotation that 15 associated with that CDS The format for this information 1s constrained by that which can be submitted to the EMBL database as seen in Module 1 Viewing amino acid or protein sequence Click on the view menu and you will see various options for viewing the bases or amino acids of the feature you have selected in two formats 1 EMBL FASTA This can be very useful when using other programs that are not integrated into Artemis e g those available on the Web that require you to cut and paste sequence into them Plots Graphs Feature plots can be displayed by selecting a CDS feature then clicking View and Show Feature Plots The window which appears shows plots predicting hydrophobicity hydrophilicity and coiled coil regions for the protein product of the selected CDS Load additional files The results from Prosite searches run on the translation of each CDS should already be on display as pale green boxes on the grey DNA lines The results from the Pfam protein motif searches are not shown but can be viewed by loading the appropriate file Click on File then Read an Entry and select the file PF tab Each Pfam mat
151. at OmniBLAST can be used to search on the basis of DNA sequence also Sequence can also be pasted into the query box FASTA or plain text and searched Workshop on Bacterial Genomics Click on the Retrieve button red arrow The omniBLAST search may take a while depending on the number size of the search Once completed the omniBLAST results are presented a summarised format as shown below as BLAST output files are large and detailed The top five hits in each search are summarised Does this search detect PFB0210c PfHT Are these results consistent with what you thought were the orthologues of pfHT in Plasmodium berghei and Plasmodium chabaudi from your previous searches Once you have looked at the search results try clicking on the various options in the results page Instituti Pathogen Sequencing Unit Sanget OmniBLAST Server Submission DB Go To Search Simple Retrieve result for id s2bA7905LO8QeE96d7270 Your BLAST query has been added to the queue of jobs The majority of BLASTs are completed within two minutes To retrieve your results click the retrieve button above or use the following URI http www genedb org zenedb2 blast zetblast id s2bA 7908LOSOCE 9647270 Hosted by the Sanger Institute Send us your comments on GeneDB 5 o Sanger instituti Pathogen Sequencing DB Go To Blast Server Results Search Simple hd
152. ata from the brzanisms of choice Once query results have been retrieved users can combine results files in one of three ways adding files together union identifying common results between files tersect or identifying unique results between files subtraction Please note you must have cookies enabled in your browser in order for this page to work correctly result files can either be viewed or downloaded as a FASTA file of DNA or protein sequences Query Start time Response time Result Download Size Genes for ryp intersect Proteins that contain Pfam domain transporter 3 14 40 lt 1 second view Download 22 Genes for ryp intersect Proteins predicted to have between 7 and 8 transmembrane domains 3 15 03 second view Download 71 Follow one ofthe links in the table above to view a query result set or select two of the result sets and use one of the following buttons to combine them Union will create a result set that contains all the genes in either of the selected sets Intersect will create result set that contains only the genes in both of the selected sets Subtract will remove any genes in the second i e appearing lower in the list set from the first UNION or INTERSECT or SUBTRACT m selected query results a new entry will appear at the end of the list 118 Workshop on Bacterial Genomics 13 ane Wellcome Trust DES Query History es anger Institute Pathogen Seq
153. ated annotation that can be searched sorted and downloaded using a single web based resource The current release stores 33 datasets see Table 1 of which 12 are curated and maintained by biologists who review and incorporate information from the scientific literature public databases and the respective research communities 85 Workshop on Bacterial Genomics Sequence and annotation of the following organisms is currently represented within GeneDB M status September 2005 prokaryotic Bacteroides fragilis Bordetella spp Burkholderia pseudomallei Chlamydophila abortus Corynebacterium diphtheriae Erwinia cartaovora Salmonella typhi Streptomyces coelicolor S aureus MRSA 5 aureus MSSA Emiliania huxleyi eukaryotic Aspergillus fumigatus Schizosaccharomyces pombe Saccharomyces cerevisiae Dictyostelium discoideum Entamoeba histolytica Schistosoma mansoni Kinetoplastida Leishmania infantum Leishmania major Trypanasoma brucei Trypanasoma congolense Trypanasoma cruzi Trypanasoma gamabiense Trypanasoma vivax Apicomplexan Eimeria tenella Plasmodium berghei Plasmodium chabaudi Plasmodium falciparum Plasmodium knowlesi Theileria annulata Vector Glossina morsitans 86 Project Type Whole genome Partial genome EST Based Project Status Complete In progress Sequenced in multple seq Centres Curated Manually curated Genome not sequenced by the PSU Workshop on Bacterial Genomics
154. ating systems are different the command lines used to run the programs are the same One of the main differences between the two operating systems 15 that in Windows the Blast program command line 15 run in the DOS Command Prompt window whereas in Linux it is run from a Xterminal window Aims The aim of this module 15 to demonstrate how you can generate you own comparison files for ACT from a stand alone version of the Blast software In this module you will use Blast to generate comparison files for sequences that you have downloaded from the EBI genomes web resource copy of the Blast software has been installed locally You will run Blast from the command line using two different programs from the NCBI Blast distribution to generate ACT readable comparison file for two small sequences plasmids and for two large sequences whole genomes Exercise 1 In this exercise you are going to download two plasmid sequences in EMBL format from the EBI genomes web page You are then going to use Artemis to write out the DNA sequences of both plasmids in FASTA format These two FASTA format sequences will then be compared using BlastN to identify regions of DNA DNA similarity and write out a ACT readable comparison file The plasmids chosen for this comparison are the multiple drug resistance incH1 plasmid pHCM1 from the sequenced strain of Salmonella typhi CT18 originally isolated 1993 and R27 another incH1 plasmid first isolated from 5 typh
155. ation Using a variety of tools methods some of which you will already have covered in earlier modules identify putative members of this complex and complete the table on page at the end of this exercise Start by identifying how many components have been annotated to this complex in S pombe which you will be using as a thoroughly annotated reference genome exercise 1 1 Exercise 1 1 1 Go to the GeneDB homepage http www genedb org Welcome to the The Wellcome Trust DE GeneDB website lt Sanger Institute Version 2 1 Pathogen Sequencing Unit Database Entry Point Sequence Searches Fungi omniBLAST Search for gene hy ID description in Multi organism Atorgeniems BLAST Bacteria Go To sj 2 Include description in search Go To single Choose organism BLAST Choose Parasite Vectors Select 5 pombe from the pull down menu Go to our main search page our complex querying page or AmiGO Links GoTo Choose Guide to GeneDB PSU Sequencing Projects Prokaryotes What is GeneDB and what s in it Eukaryotes Protozoa Navigating Searching GeneDB Eukaryotes Contacting Us Feedback Software ACT Privacy Policy Artemis Data Release Policy Funded as part of the Wellcome Trust Functional Genomics Development Initiative the GeneDB project has a primary goal to develop and maintain curated database resources for three organisms Schizosaccharom
156. base Map EBI Queries Services Databases InterPro IntrProhome Sequence Search Databases Documentation FIPsie Protein ofthe month Search Search Entries Search InterPro Simple Full HTML Version Click here for help 3 Overview sorted by AC sorted by name known structure grouped by taxonomy proteins with splice variants sorted by AC sorted by name of known structure proteins with splice variants For all matching proteins of known structure Database ID Name Proteins PRINTS PR00298 CHAPERONIN60 1503 PROSITE pattern 500296 CHAPERONINS CPN60 845 IPR012723 chaperonin GroEL The assembly of proteins has been thought to be the sole result of properties inherent in the primary sequence of polypeptides themselves In some cases however structural information from other protein molecules is required for correct folding and subsequent assembly into oligomers 1 These helper molecules are referred to as molecular chaperones a subfamily of which are the chaperonins 2 They are required for normal cell growth as demonstrated by the fact that no temperature sensitive mutants for the chaperonin genes can be found in the temperature range 20 to 43 degrees centigrade 1 and are stress induced acting to stabilise or protect disassembled polypeptides under heat shock conditions 2 Type chaperonins present in eubacteria mitochondria and chloroplasts require the con
157. bases Escherichia coli K 12 pee Metabolic pathways and enzymes from 300 organisms BioCyc Open Chemical Database Chemical compound database Tier 2 Computationally Derived Databases Subject to Moderate Curation 17 databases are available list of tier 2 DBs Tier 3 Computationally Derived Databases Subject to No Curation 142 databases are available and ready for adoption more by interested scientists for curation and updating PGDBs in Tier 3 were produced as a collaboration between the groups of Peter D Karp at SRI International and Christos Ouzounis at the European Bioinformatics Institute list of tier 3 085 Other Pathway Genome Databases on the Internet Arabidopsis thaliana S Rhee Department of Plant Biology Carnegie Institution USA van Enckevort The Netherlands C Ouzounis European Bioinformatics Institute UK Pseudomonas Genome Froject PseudoCyc Pseudomonas aeruginosa n plata Pseudomonas Genome Project Simon Fraser Yeast Biochemical SGD curators Stanford U USA Pathways Contact us if you d like your PGDB added to this list AraCyc Lactobacillus plantarum WCFS1 LacPlantCyc Registry of Downloadable Pathway Genome Databases a NUMEN Workshop on Bacterial Genomics Module 5 Genome Resources BioCyc Query Page Microsoft Internet Explorer File Edit Favorites Tools Help pack ix
158. behaviors from genomic and chemical information Towards this end we have been developing a bioinformatics resource named KEGG Kyoto Encyclopedia of Genes and Genomes as part of the research projects in the Kanehisa Laboratory of Kyoto University Bioinformatics Center Main entry point to the KEGG web service KEGG2 KEGG Table of Contents Four constituent databases of KEGG PATHWAY 30 417 pathways generated from 242 reference pathways GENES 1 099 188 genes in 31 eukaryotes 233 bacteria 23 archaea LIGAND 12 849 compounds 2 435 drugs 11 152 glycans 6 448 reactions BRITE 7 351 KO KEGG Orthology groups Selected KEGG Organisms hsa human mmu mouse rno rat dre zebra fish dme fruitfly cel nematode ath thale cress sce budding yeast spo fission yeast eco E coli bsu B subtilis Quick search by DBGET Search for _ Clear Example Alzheimer Introduction Standards Links GenomeNet User manuals XML Related databases FTP access References API KegDraw KegArray Feedback Release notes Disclaimer KEGG Release 35 1 September 2005 plus daily updates Copyright 1995 2005 Kanehisa Laboratories 7 Workshop on Bacterial Genomics Module 5 Genome Resources Elle Edit View Go Bookmarks Tools Window Help lt a Home Bookmarks Google 5 Library SSC SSC dev WIKI PSORT
159. below ve File Select View Edit Display ro eno 2400 3200 Jono son 1200 Use the vertical sliders to zoom out Drag or click the slider downwards from one of the genomes The other genome will stay in synch goo 1609 2400 3200 pon 4800 600 54100 7200 2 Workshop on Bacterial Genomics Module 2 Comparative Genomics St dna vs Eck12 dna File Entries Select View Goto Edit Create Write Run Graph Display 524900 1049800 1574700 2099600 26245 LOCKED 24900 1049800 1574700 2099600 26245 Once zoomed out your ACT window should look similar to the one shown above If the genomes in view fall out of view to the right of the screen use the horizontal sliders to scroll the image and bring the whole sequence into view as shown below You may have to play around with the level of zoom to get the whole genomes shown in the same screen as shown below Stina vs Eck12 0na File Entries Select View Goto Edit Create Write Run Graph Display Ei 24900 1049800 1574700 2099600 2624500 3149400 3674300 4199200 4724100 24900 1049800 1574700 2099600 2624500 3149400 3674300 4199200 33 Workshop on Bacterial Genomics Module 2 Comparative Genomics Notice that when you scroll along with either slide both geno
160. brucet peptide peptidaie putative Armotaten 10 T brucii conserved 11 T brucs hypetheted protem Annotation 12 T brucer aexireoak cholehesphotransferarr patatie Manual Annotation 13 glucose transporter fragment Manual Annotation 14 prota Annotation 15 Th 10 30 ced T bruei mamerin rkt putamus 1 GPO 15920 T anchor biorynthedus prote Annotation 17 Tb 107 T brucs ypetheteal prolem eeniteved Annotahen 18 Te 70 7931 T braver protein conserved Annetason 13 Teli 389 0140 T bruce Igpeothescal protein conserved Manual Annotanon n PRU 3589 T Tgpothescal proiem conserved Manual view this result set later nr tn combine with others visit the history page Start completely new enmplex query These are the descriptions of the queries you have executed Query History C Sanger isini gt Pathogen Sequencing Unit GoTo Search Simple z Help This page displays results of queries executed using the boolean query pages The page will remain empty until searches have been run Please note that a search combining boolean bperators will return multiple results files Choosen queries are first executed across all organisms within GeneDB and subsequently narrowed down to only return d
161. certed action of 2 proteins chaperonin 60 cpn60 and chaperonin 10 10 Type chaperonins found in eukaryotic cytosol and Archaebacteria comprise only a cpn60 member The 10 kDa chaperonin 10 or groES in bacteria exists as a ring shaped oligomer of between 6 to 8 identical subunits whereas the 60 kDa chaperonin cpn60 or groEL in bacteria forms a structure RH 2 stacked rings each ring containing 7 identical subunits 1 These ring environment for protein folding whilst cpn 10 binds to cpn 60 and VEMM the release of the folded protein in an Mg ATP dependent manner 3 2 The binding of cpn10 to cpn60 inhibits the weak ATPase activity of cpn60 81 Workshop on Bacterial Genomics Module 5 Genome Resources Protein matches Mozilla IPR001844 PR00298 IPR002194 PS00750 IPR002194 PS00751 IPR002194 PS00995 PF00118 IPR002423 IPR002423 IPR002423 UniProt P48424 Scale 10aa 1PRO08950 5 THEAC IPR012714 Structure CO Es 1 la6dA 110560102 330260102 3507103 00 PR00304 55 48592 116802339 IPR001844 PR00298 IPR002194 PS00750 IPR002194 PS00751 PS00995 PF00118 PR00304 IPR002194 IPR002423 IPR002423 IPR002423 IPR008950 UniProt P48425 Scale 10aa SSF48592 a Eile Edit View Go Bookmarks Tools Window Help Bi nttpujwww ebiac
162. ch will appear as a coloured blue feature in the main display panel on the grey DNA lines To see the details click the feature then click View then View Selection or click Edit then Edit Selected Features Please ask if you are unsure about Prosite and Pfam Viewing the results of database searches Click the View menu then select Search Results and then Fasta results The results of the database search will appear in a scrollable window If you click on the button at the bottom of this window labelled view browser then the results will be posted into an internet browser window Within this window there are many active links coloured blue to external sources of information such as the original database entries for all those aligning to your sequence as well as information stored in PubMed PFAM and many others Have a play Further information on specific Prosite or Pfam entries can be found on the web at http ca expasy org prosite and http www sanger ac uk software Pfam tsearch shtml el Workshop on Bacterial Genomics Module 1 Artemis In addition to looking at the fine detail of the annotated features it 1s also possible to look at the characteristics of the DNA covering the region displayed This can be done by adding in to the display various plots showing different characteristics of the DNA This information is generated dynamically by Artemis and although this is a relatively
163. consensus sequence real or made up Note that degenerate base values can be used Appendix VIII Amino acid consensus sequences real or made up You can use X s Note that it searches all six reading frames regardless of whether the amino acids are encoded or not What are Keys and Qualifiers See Appendix III 10 Workshop on Bacterial Genomics Module 1 Artemis Clearly there are many more features of Artemis which we will not have time to explain in detail Before getting on with this next section it might be worth browsing the menus Hopefully you will find most of them easy to understand Artemis Exercise 1 Part II L Artemis Entry Edit File Entries Select View Edit Create Write Run Graph Display o Nothing selected Entry Ms_typhi 4 FS typhi tab 1 PIU AE HEN TL a CDS PITT TERT il LT TIENI HERE UTE MINDE EEN ALL EEEE Y vam features Ah LEE IE MAI TTE ji Die 10 p b p Ub RBS misc feature mi RBS RBS 115 1 82400 2184600 2186800 ae gt UA 2202200 22044 Misc features TT WHEEL I EM fibi rni s i LO IT ETEEN S AN EI t HY CEBIT LL LU MA HE sE 1 STY2 8TY2352 STY2 Psy ee OO f o1
164. corrections and updates ee 91 Workshop on Bacterial Genomics Exercise 1 2 The use of keywords to search the available T brucei annotation Welcome to the GeneDB website Version 2 1 DB gt The Wellcome Trust lt Sanger Institute 2 Sequencing Unit Database Entry Point Sequence Searches Fungi omniBLAST Search for gene by ID description in Multi organism BLAST organisms Go Ti organism BLAST Choose Include description in search Add wildcards to search term Reset our main search page our complex querying page or AmiGO Guide to GeneDB What is GeneDB and what s in tt Navigating Searching GeneDB Contacting Us Feedback Privacy Policy Data Release Policy PSU Sequencing Projects Software Choose Protozoa brucei Go To Prokaryotes Eukaryotes Eukaryotes ACT Artemis rotozoa Funded as part of the Wellcome Trust Functional Genomics Development Initiative GeneDB project has a primary goal to develop and maintain curated database resources for three organisms Schizosaccharomyces pombe for which a complete genome sequence has been obtained and the kinetoplastid protozoa Leishmania major and Trypanosoma brucei whose genome sequences have yet to be completed It is envisaged that the generic database structure will subsequently be adopted to integrate datasets for other organisms both
165. cted Features 1 Delete Delete Selected Exons 5 Es gt 8 Merge Features TUE EUH DUE HELL EUH LEE UE ULP TTT M Eee Setected Features PF13 119 Trim Selected Features To Met Trim Selected Features To Any Trim Selected Features To Next Met Ctrl T 800 1600 2400 3200 laog L 4800 5600 64 0 Ore Selected Features To Next Any Ctrl Y Extend to Previous Stop Codon Extend to Mext Stop Cod WI LE NE E UE B LAE e e T LH WE HE EHE AH AL J Batematteatty Create Gene Names Fix Gene Names HB ana Lar p n un d HE MEE LUE IEEE 7 Delete Selected Bases m RI fidd Bases fit Selection 2 58 E L I X T R Add Bases From File M Y Ss hx cB 0 GCTCTTGAGGCAACTAATGTAGAGCTAGCTTTTCATCAAT AAATGGTTAGTACACATATATACAATGAGAAATGTTAAGTATTTATTTATTT 0 4620 4630 4640 464 4660 4670 4680 4690 4700 Tq B cB a r f AATTTACCAATCATGTGTATATATGTTACTCTTTACAATTCATAAATAAATAAAT SP Gb NS ain GL ESSE NS E E e ee HEC N sk Es 40 I Select both t
166. ctivity 5 to 3 DNA helicase activity GO 0003678 GO 0004003 GO 0008722 GO 0017116 GO 0017117 GO 0043138 GO 0043139 F Catalysis of the hydrolysis of ATP to unwind the DNA helix at the replication fork allowing the resulting single strands to be copied Catalysis of the reaction ATP H20 ADP phos phate driving the unwinding of the DNA helix Catalysis of the reaction ATP H20 ADP phosphate in the presence of single stranded DNA driving the unwinding of a DNA helix Catalysis of the unwinding of the DNA helix in the direction 3 to 5 Catalysis of the unwinding of the DNA helix in the direction 5 to Consider also annotating to the molecular function term DNA binding GO 0003677 Consider also annotating to the molecular function term ATP binding GO 0005524 Consider also annotating to the molecular function term ATP binding GO 0005524 7 javascript NewWindow go cgi view details amp search_constraint ter 2651b1126171013 Details 650 650 custom Submit FlyBase SGD 0 0003673 Gene Ontology 120591 6 GO 0008150 biological process 72641 GO 0005575 cellular component 59242 GO 0003674 molecular function 94552 GO 0003824 catalytic activity 29832 GO 0004386 helicase activity 676 10 GO 0003678 DNA helicase activity 166 Get this tree as RDF XML Get this data as a G
167. dicted GO component 26S proteasome NS 6 phosphofructokinase complex acetyl CoA carboxylase complex actin capping protein complex GO Component ae alpha polymerase primase complex alpha ketoglutarate dehydrogenase complex sensu Eukarya anaphase promoting complex apical complex B Query options Rows page 20 gt Hosted by the Sanger Institute Send us your comments on GeneDB The query should return 74 results Examine the results to see which proteins could be hexose glucose monosaccharide transporters and whether 0210 is present Then use the browser back button to go back and submit the second and third queries Note that the results page tells you whether the protein has been manually or automatically annotated 20 We re now going to look at ways of treating the results sets that are obtained from our Boolean searches We can add subtract and intersect different results sets using the history page We ll also look at downloading the results sets The W um Trust DB m Ie 2 Pathogen Sequencing Unit Go To GeneDe gt Search Simple Hs Query Genes for pberghei pchabaudi malaria intersect Proteins with a product containing the keyword or phrase transporter intersect Proteins containing a predicted signal peptide 1 13 1 1 name id organism description 1 PB000033 0 L0 P berghei transporter protein putative Automa
168. dow Help 1i 4 Wpc jwewebi ac uk genomes plasmid htmi Back Formani Reload Siop hd H Ifa I uropean Bialnfarmatice Institute Plasmid Genomes Description 7 Home ukBookmarks Af WebMall af Connections f Biz Joumal gf Smanvpdate af Mktplace af Ed ees Map Tar uad cui MUT eae Bact na gt Eukaryota 12 Acetobacter aceti pACS Phage i Plasmid 2 Acetobacter pasteurianus plasmid 12875 TP Acidithiobacillus ferooxidans pTFA 1 plasmid Links ib Acidithicbacillus ferrooxidans plasmid pTFS HORSES Server Acinetobacter sp EB104 plasmid Actinobacillus pleuropneumoniae plasmid M1 Aeromonas salmonicida plasmid pFu amp S 3 2 monu i plasmid pRAS3 1 Accinbarer aces 5 423 Apolo amumo 7 SES Scrosetdang uM 4 104 vem 19 792 11 823 A04 3202 11 851 7 SES Agrot act ut Agrobacterium rhizogenes plasmid pRIT724 Agrobacterium tumefaciens ocropine rype TI plasmid 194 140 AEZAZENI 157 EASTA SES 9b Agrobacterium tumefaciens plasmid pTI SAKURA 206 479 195 585 10a Agrobacter
169. e refine the gene models generated by automated predictions We have generated automated gene models for the P knowlesi contig using PHAT Pretty Handy Annotation Tool a gene finding algorithm see in Mol Biochem Parasitol 2001 Dec 118 2 167 74 and the automated annotation 15 saved in Pknowlesi contig embl e Zoom into the P falciparum gene labelled 1010 shown below Can you compare the 2 gene models and identify the conserved exon s between the 2 species e Use the slider on the comparison view panel to include some shorter similarity hits Can you now identify all the conserved exons of the PFM1010w orthologue in the P knowlesi contig For the time being disregard the misc_feature for Phat4 coloured in red in the Pknowlesi_contig embl e Open the Content window from graph menu for both the entries Can you relate the exon intron boundaries to GC content for the P falciparum gene labelled PFM1010w Is it also applicable to the gene model Phat4 in the P knowlesi contig e Example regions Pfal_chr13 embl 789034 793351 Pknowlesi contig embl 15618 20618 ee zx Rida via Ua ES s AW li m ama TI mn E rain FL 792000 792000 292600 79000 np TT nd Wy ii ut AA OI TA Il
170. e 1 8 60 InterPro IE 23 complex 20 kDa subunit Derived from hit Pfam E m Gene Ontology Annotation Term browse Amigo Qualifier Evidence Other po Biological Process actin filament polymerization 155 TIGR_Tbal Tb927 2 2900 TIGR_REF GO_ref with SWISS PROT 015509 1 other actin filament polymerization GOC interpro2go lother Cellular Component 2 3 protein lex 155 Tba1T6927 2 2900 TIGR ref with SWISS PROT O15509 2 others cytoskeleton IEA GOC interpro2go 126 others Molecular Function structural constituent of cytoskeleton 155 TIGR_Tbal Tb927 2 2900 TIGR_REF GO_ref with SWISS PROT O15509 20 others Literature Search for gene protein in PubMed Database Cross References DB Aces Description UnProt Actin related protein ARP2 3 complex subunit putative TGAD 5927 2 2900 TIGR s T brucei Annotation Database TGAD Orthologues DB Aces Description Note GeneDB_Lmajor LmjF02 0600 ARP2 3 complex subunit putative predicted by jaccard cog clustering GeneDB Terum Tc00 1047053509127 104 ARP2 3 complex subunit putative predicted by jaccard cog clustering GeneDE Toru 00 1047053508737 194 ARP complex subuntt putative predicted by jaccard cog clustering Database Similarities call all 3754 020002180 biological process 2419 600005575 cellular component 3074 Ha GO 0003674 molecular function 3530 Ha 5005198 structural molecule activity
171. e database using a preset selection of query forms and the boolean operators AND and OR For example to construct a simple query begin by clicking on the AND button Then use the pulldown menus to select two different queries Finally click on proceed to next step to generate a page that will allow you to specify parameters for the two queries Running the resulting boolean query will return only those objects that satisfy query 1 AND query 2 Please note you must select query from each pulldown menu before the proceed button will work If you need to back up a step use the browser s back button You are currently searching T brucei Choose a different organism Choose boolean condition or select a query OR Proteins containing a specific Pfam domain Proteins with a product containing a particular keyword or phrase Proteins with annotation matching a particular keyphrase Proteins with a predicted GO function Proteins with a predicted GO process Proteins with a predicted GO component Proteins within a range of length in amino acids Hosted by the Proteins within a particular range of molecular masses Proteins containing a specific Pfam domain Proteins with a Pfarn domain Proceed t Send us your comments on GeneDB Proteins containing a predicted signal peptide Protein containing one or more predicted transmembrane domains 7 115 Workshop on Bacterial Genomics 3 D Ihe Wellonaine Tris
172. e length of the two sequences in this middle layer If insertions were present in either of the genomes they would show up as breaks between the solid red conserved regions Data used to draw these red blocks and link conserved regions is generated by running pairwise BlastN or tBlastX comparisons of the genomes details of how this 1s done are outlined in Appendix II and can be obtained from the ACT user manual http www sanger ac uk Software ACT manual Aims The aim of this Module 1 for you to become familiar with the basic functioning of by using a series of worked examples Some of these examples will touch on exercises that were used in previous Modules this 1s intentional Hopefully as well as introducing you to the basics of ACT this Module will also show you how ACT can be used for not only looking at genome evolution but also to backup or question gene models and so on 29 Workshop on Bacterial Genomics Module 2 Comparative Genomics 1 Starting up the ACT software Make sure you re in the correct directory Comparative Genomics Module 5 Then type act amp return A small start up window will appear Now let s load up a S typhi versus Escherichia coli comparison The files you will need for this exercise are S typhi dna S typhi dna vs 2 EcK12 dna Ei Release 2 eta s Comparison Tool e 2 beta rFoKaryotic mode Copyright 1998 2002 Genome Re
173. e view panel OH File Entries Select View Gato Edit Create Wite Pon Graph Dij dana lona selected base on forward strand 1004846 Ree anir iab Enpo name Co B ZTrdi52 Tra aTvaseq STY 1628 jn 45500 52000 50500 25000 71 nan 18000 mason 1000 97500 104000 10500 11 TOAD 123500 13 10000 dq mI qd lt T ETTJ6D3 605 14 119 462 687 4527 1 4 d GNEC E STYd600 2 1604 sTY461z 216 1 3 api di insA 1 O ae jl ETY3601 8 135722 26 31 ETYM SII insB G H C 1 V U BP T M B tL H amp i H M T D 1 oF ow SN Eo Ho 1 KM A 5 B U A L L d B V H 1 U L D Hh R 1 I b L P l eo UA oe I Coe OF D I WH I RL I Pb C L J 5 TAREE ASER OEE SRST RH NOTTS jao 70 ED 1 oo 110 sms ae D em c ATTUTCOTAAKAACAAATTO x TU D DT WV P E V T L L L HK WG DR a LE ceo ni LR I EL 109620 110095 Eimilar to Escherichia coll ing protein ing 089421 EHEL iAJ2234175 94 aa scores EL 0 3 GEE Workshop on Bacterial Genomics Module 1 Artemis To
174. edected genes sequenced T predeted peotens P cram coig data T ert EMBL data T irmoa Blots dageareg pae dacied gence codng sequenzes BOT broei predicted protest T bon chromorome 9 reads These chrormedeme 10 C T bruon chromedcest 11 reads N Azrantrypanoromes EMEI data Afncan trypanozome data GOESTI clusters TDeosgolrese rends H T enisi enp predicted peras IN T predicted peobens congr N FTE data dara I8 rira predicted coding D T vrar preducted proteme P rwar N vive B hnnchnepf ca complete N E predicted probne P B panapentuseu complere sequemce parapernane predici protean P pecs F RE complete anqueneor IN E pavements peedzted protini P X gyghi T X chroemoromal pequence M X chromosoma protene P pHCM 1 sequence P E peli pHCMI prod P X eequence amp depo prone P mori CRhoemeentaas clnered ESTs RESULTS
175. enerating ACT comparison files using BLAST We will treat N315 dna as the database sequence and MW2 dna as the query sequence File Edit View Terminal Go formatdb N315 dna p Now we can run the megablast on the two MRSA genome sequences The default output format is one line per entry that ACT can read therefore there is no need to add an additional flag to the command line see appendix II View Terminal Go Help megablast d N315 dna MW2 dna N315 vs MW2 Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST The N315 vs MW2 comparison file can now be read into ACT along with the N315 embl and MW2 embl or N315 dna and MW2 dna sequence files 2200 Pall 111 h NE i n Lm 11000 09200 600 15600 19800 Nm LOCKED i n nun M A M Mu mi 20D 600 000 11000 13200 15400 17500 NE LT 24200 25400 d m Lali TA A comparison of the N315 and MW2 genomes ACT using the megablast comparison reveals a high level of synteny conserved gene order This is perhaps not unsurprising as both genomes belong to strains of the same species Using results of comparisons like these it is possible to identify genomic differences that may contribute to the biology of the bacteria and also investigate mechanisms of evolution Both N315 a
176. ered contigs from chrTX XI as well as 3 BACs from homologous regions of chrV VII and Select the link to the boolean Pari ETRA Org querying interface by clicking E Searches Analysis p omniBLAST e Browse Catalogues on the complex querying Include description Products Add wildcards Motif Search SWISS PROT Kevword EMOWSE SWISS PROT Keywords pages button on the T ess Pram h List Download G rucel omep age Cross Organism Search Page E enome ZLo ser ee Contig Chromosome Maps Full Content Search Complex Boolean Query RNAit primer design E MSIE E Data Example genes News archive Data releases Trypanosomatids genomes and biology CD ETP download Sanger 15th July 2005 FTP download TIGR Tritryp genomes published in Science Help Feedback Curator Technical 22nd June 2005 non coding RNA annotation revised protein feature domain predictions updates Sanger T brucei project TIGR 7 brucei project TIGR 2nd February 2005 T congolense project T vivax project E URN I C CBE T b gambiense project T congolense and T b gambiense data T brucei Genome Network Biological resources available via GeneDB Start off with querying the Pfam domain distribution O The Wellcome Trust lt Sanger Institute 2 Pathogen Sequencing Unit GoTo Search Simple This page allows you to build more complex queries against th
177. erms to select a class or all those features with a particular amino acid motif 1 Select by Key Common Keys Qualifier Space for a search term or amino acid JjAmino acid motif I motif Containing this text F Ignore Case Allow Partial Match Forward Strand Features Reverse Strand Features Select View Close Defining the extent of the prophage Even from this very cursory analysis it is clear from the selection that the prophage occupies a fairly discrete region within SPI 7 see below It 1s often useful to create a DNA feature to define the limits of this type of genome landmark To do this use the left mouse button to click and drag over the region that you think defines the prophage Click on the create menu and select Create feature from base range feature edit window will appear The default key value given by Artemis when creating a new feature is CDS With this key the newly created feature would automatically be put on the translation line However if we change this it to misc feature an option in the key menu top left hand corner at the edit window Artemis will place this feature on the DNA line This is perhaps more appropriate and is easier to visualise If you also add in a qualifier such as label and add text following the label then click ok That text will be used as a feature label to be displayed in the main sequenc
178. fault is 10 Kb upstream and 10 Kb downstream of the gene selected This can be modified by the user GeneDB Range Download Page General Information Systematic id PFBO210c location complementi 205839 207403 Organi falciparum Type CDS Contig MAL Length 947102 bp Range Options You can also selected the region that yan Stream s fox you want to view using coordinates C Fronv To From 195889 eo Download Types Download features and sequence in EMBL format Download sequence in Fasta format 6 Anemis Applet Help on Anemis Please note the Java apples os all bweser OS for yos please 147 Workshop on Bacterial Genomics The Artemis Applet retains nearly all of the functions that 15 has if it 15 run locally Refer to your notes from module on Artemis if necessary There are far to many functions to describe them all here so were going to look at a few which are relevant to our investigation The hexose transporter that we are looking PfHT systematic identifier PFB0210c has been characterised biochemically It 15 able to transport glucose and fructose down chemiosmotic gradient as a classic uniporter Some residues that define substrate specifity have been identified by mutagenesis experiments Woodrow et al 2000 If the Glutamine residue at position 169 is changed to Asparaginine a mutation denoted by Q
179. for Go To P falciparum gene by major v Multi organism berghei ne BLAST aoe chabaudi Lm P falciparum P knowlesi T annulata Include description in search Go To brucei Add wildcards to search term T congolense single cruzi organism BLAST Viruses Choose Go to our main search page complex querying page AmiGO or our ID List Download Guide to GeneDB PSU Sequencing Projects Prokaryotes 131 Workshop on Bacterial Genomics Note clicking on the GeneDB icon will take you back to the GeneDB DB Plasmodium falciparum GeneDB DE Curated annotation of the Plasmodium falciparum 3D7 genome is available from this database Database Entry Point a home page by ID description br 7 Searches Analysis P ET omniBLAST Include description BLAST Browse Catalogues M Add wildcards Mo if Se cl Products EMOWSE LM AmiGO InterPro List Download Genome Browser Cross Organism Search Page Contig Chromosome Maps Full Content Search Complex Boolean Query updated 2 2005 Genome Stats Overview Modified gene models since publication Oct WTSI Plasmodium falciparum project 2nd May 2005 version 2 2002 page Genome wide review of gene prediction New gene models since publication Oct 2002 FTP download Sanger using 1 i i B models removed since publication
180. gene by selecting the exon s and then choosing Delete Selected Exons from Edit menu Similarly you can add an exon to a particular gene by co selecting the exon and the gene CDS features followed by selecting Merge Selected Features from the Edit menu Example regions Pfal chr13 embl 789034 793351 657638 660023 672361 673753 1 File Entries Select View Goto Edit Create Write Run Graph Display 3 selected bases on forward strand 796563 796565 Entry PF MAL13 1Mb embl GC Content X Window size 120 33833 796563 V T Wow 5 8364 WI EHE EHE HL HE HET TE d I see pai HH EMM c3 d E TA ME etn es Ww WIE OL TE Ten nn Sa See EET EL na d I eS GEE E Ho EM d 1025 787200 788000 788800 789600 790400 791200 792000 792800 793600 794400 795200 796000 796800 Hd P EIE LEM LM HT HE HUI d mua m romam Hd d UIN no n gn 11 1111 Bd nur gm 11 n d ub dE Hon LI wim d gm p piu der d mum wg pu d Y I Y I Y I Y I Y I T hyp
181. ges and replaced by manual GO Gene name and product information The i the full complement of gene products annotated to specific terms please use the links to Amigo in the left hand column of the Gene Ontology pro duc t lines are stan dar dize d an d in dexe d So Go To Organisms General Information 8 to Basket view Bastet that features sharing the same product lines can Name arc4 2 E RE be retrieved Access to nucleotide and amino Obsolete Synonyms arp 10 acid sequences of the feature are also provided Status role inferred from homology Product ARP2 3 actin organizing complex subunit Arc4 Type CDS Sequence DNA and Protein Location 2 Chromosome 1 Basic location information and context map Contig 6977 2 5 Contig Location complement 3228809 3229681 Unspliced length 873 b Clicking hical display L2 Artemis Es complement join 3 67 3229290 3229421 3229623 3229679 3229681 Spliced length 507 open up an Artemis applet which will be discussed further in exercise 2 Via the applet the feature can be viewed in the context of the sequence and additional annotation such as UTRs Graphical Display Geste Artemis Context E a 3220000 3225000 3230000 3235000 3240000 1 iimm pidi aTi minh agame LT BR TF nd P pg is File Bebeck Tie Gite
182. h SWISS PROT OLIS 2 others TEA mimpeo ge 1 E CR e I I 1 SS 2 GO nef uh SWISS PROTOT5509 Literature Database Craas Eeferemees Arta Description u927 2 2200 TIGE 1 T bruan Asnotabon Database TOAD DB Description Note DB CDS LmjF02 0600 DB DB Lotfjce 0600 ccesplez sdbanil putre predicted cog clustecng Tel SSG 127 10d AROS complex pore predicted by ecard cog duteng Terma 1047053508737 194 ARPAN complex mbang pote predicted by cog duteng Cover the Chutes seme feedback and GIC mbane were leat dae problema We hire cur log Bea do and a al Database 5 E weh appropnabe you have pend on Bur bet coud you remeber i ctl be appropnate mese 2 5 Nama Syssematir Name 0500 Systematic ld Lefi 0620 Sache Bom homology Products complex pater 2 CDS Sequesre DHA and Prote Location 2 Oa 41 111 Contig Leer 01 20040630 V4 Locatinn 313087 313803 Leng amp TIT bp Canrext i ess rete OSES Oe ole FOS Lares 410 Lorie Oise LmuPog 650
183. he descriptions products for these organiums Browne Fungi S cerevisiae S discedens L berghei biie T vivax Bacteria B brunchisegsticu pasraiperrtussis B nhi Parasite Vectors D marmint Brower Hs SWISS PROT Eciwards Brew se thou gh ihe SWISS PROT boeywonds in thes iu asinum Benue Fungi pombe lPratarmat D discoidenm maj Pion Bros iri vus gh the Pfam domami as few thu aT Fungi amp cenevisiae jambe Frotogna x berghei chohini T FT vivi Bacteria bronchisephicu parapertuxsis B pertuxsis Parasite Vectors GG morsitan Inicr Pro Assignments sesch for namel Bec uw thra gh the imera OAL PEUITRCTME Ewa Aum Browne imo omajer b P T brimi crz vivas Browsing Hs Catalogue Search for namel rower hgh the Hily ae su PT x for these Org anima frome Workshop on Bacterial Genomics A results list will then be displayed as below note that the whole list is not shown We are now g
184. he original gene model and the new CDS feature which is to be merged with it to form a new exon Tip to select more than one feature of any type you must hold the shift key down 26 Workshop on Bacterial Genomics Module 1 Artemis The new CDS feature can then be merged with the original gene model as shown above A small window will appear asking you whether you are sure you want to merge these features Another window will then ask you if you want to delete old features If you click yes the CDS features you have just merged will disappear leaving the single merged CDS If you select all of the three CDS features the two CDSs that you started with plus the merged feature will be retained enable editing Click here to O File Options Reread Options Enable Direct Editing 7 4 Eukaryotic Mode Highlight Active Entry 4 Black Belt Show Log Window Artemis start up window direct Hide Log Window You may noticed after you performed the merge function that one of the exons has subsequently jumped into another reading frame Artemis automatically splices the CDS and so if the exon boundaries have an additional partial codon then any following exon will be pushed into another reading frame to account for this To correct this you can edit the exon boundaries directly by turning on manual editing in the options menu of the Artemis start
185. hei proteins wublastp for query TA02485 Full BLAST Search Name PB000562 01 0 Score 687 P N 4 7 69 1 Full Sequence Name PB000161 03 0 Score 138 P N 1 3 08 1 Full Sequence Name PB000390 01 0 Score 100 P N 0 00087 N 2 Eull Sequence s 5 Name 1 144 020 67 P N 0 088 N 3 Eul Sequence Clicking here will show Name PB105950 000 Score 59 0 18 N 1 Eul Sequence the full sequence of this Summary for P falciparum proteins wublastp for query TA02485 Full BLAST Search Name 0210 Score 704 P N 2 6c 71 1 Ful Sequence Name 10955 Score 262 P N 7 7e 22 N 1 Eull Sequence Name PELOS90 Score 89 P N 0015 1 Sequence Name 0310 Score 90 P N 0016 1 Eull Sequence Name PEBO465 Score 72 P N 0 044 N 2 Eull Sequence protein Full BLAST Search Full Sequence Summary for P chabaudi proteins wublastp for query TA02485 Name PC000736 00 0 43 P N 3 5 43 N 1 P N 6 6 21 N 1 P N 6 4 20 N 1 Score Score 257 Full Sequence Name 000442 01 0 230 Full Sequence This approach has demonstrated how omniBLAST search can identify the gene of interest in your organism when a well annotated orthologue exists in another organism So this is a useful alternative strategy to searching on keywords alone which we have seen can in some cases be mislead
186. i the 1960s 48 Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST Downloading the 5 typhi plasmid sequences ha Genomes Pages A tha EGI Mozilla 1 Ele Edit Go Bookmarks Tools Window Help Back Forevard Reload 51 2 3 11 il 4 Home ubBookmarks a WebMall af Connections gf Bizjoumal af Smanupdate af Mktplace 1 1 1 0 MBL EBI Bioinformatics Institute Home Abest Senaces Toco Atthe Access to Completed Genomes Pirate note that the genomes pages have been necenthy changed The old pages ane smi ascallabbe if you s08 merd To check Them were deposited int the EMAL Database in The early 1807 Since then molecular biology hift to obtain The Complete sequences of a many qenomet a amp possible combined with major developments in Sequencing tec hnolixqy resulted in handed of complete Geno Sequence e being added t The database including Archaea Bacteria and Eukaryata These web pages qive access toa Lange number of complete genomes help available to describe the The first completed genome From plages and layout Whole Genome Shotgun Sequences WiGS Methods using whole genome shoogun data used ro gain a large amaouna of genome coverage Tor an organism WS daia for a growing number of organisms are being submitted to DOG JE MEL GenBank and are made availab
187. ia 32 freed tar gz blast 2 2 6ia324inux tar gz blast 2 2 6ia32 solaris tar gz blast 2 2 6 ia32 win32 exe blast 2 2 64a6Hinux tar gz blast 2 2 6 mips irix32 tar gz blast 2 2 6 mips irix64 tar gz blast 2 2 6 power4 aix tar gz 51 blast 2 2 6 powerpc macos93 hqx 51 blast 2 2 6 powerpc macosx tar gz Cl blast 2 2 6 sparc solaris32 tar az lblast 2 2 6 sparc solaris64 tar gz A netblast 2 2 6 alpha osf1L tar gz netblast 2 2 64a32 freebsd tar gz z Go Links 2 Ll netblast 2 2 6 a32 inux tar gz netblast 2 2 6 i32 solaris tar gz netblast 2 2 6 ia32 win32 exe netblast 2 2 64a6 Hinux tar gz Fl netblast 2 2 6 mipsrix tar gz netblast 2 2 6 mips4rix32 tar gz netblast 2 2 6 power4 aix tar gz netblast 2 2 6 powerpc macosx tar qz Ane tblast 2 2 6 sparc solaris tar qz Appendices Blast 2 2 6 1a32win32 exe is the blast exe file for windows 161 Workshop on Bacterial Genomics Appendices You now need to save the blast 2 2 6 ia32 win32 exe file in a new directory blast on to the hard drive of your PC P Some files can ham your computer If the file information below i looks suspicious or you do not fully trust the source do not open or File name blast 2 2 64a32 win32 exe Filetype Application From ftp ncbi nih gov This type of file could harm your computer if it contains malicious code Would you like to open the fi
188. ies DNA Data Bank of Japan DDBJ EMBL Nucleotide Sequence Database Genomes at the EBI GenBank Microbial Genome Databases Resources sanger Microbial Genomes TIGR Microbial Database Institute Pasteur GenoList databases Including SubtiList Colbri TubercuList Leproma PyloriGene MypuList ListiList CandidaDB Pseudomonas Genome Database Clusters of Orthologous Groups of proteins COGs SCODBII S coelicolor database Protein Motif Databases Prosite Pfam BLOCKS InterPro PRINTS SMART InterPro Protein feature prediction tools TMHMM Prediction of transmembrane helices in proteins SignalP Prediction Server PSORT protein prediction Metabolic Pathways and Cellular Regulation EcoCyc ENZYME Appendices http www ddbj nig ac jp http www ebi ac uk embl html http www ebi ac uk genomes http www ncbi nlm nih gov http www sanger ac uk Projects Microbes http www tigr org tdb mdb mdbcomplete html http genolist pasteur fr http www pseudomonas com http www ncbi nIm nih gov COG http www 11016 jic bbsrc ac uk S coelicolor http www expasy ch prosite http www sanger ac uk Software Pfam index shtml http blocks fherc org http www ebi ac uk interpro http www bioinf man ac uk dbbrowser PRINTS http smart embl heidelberg de http www ebi ac uk interpro index html http www cbs dtu dk services TMHMM 2 0 http www cbs dtu dk services SignalP http psort ims u tokyo ac
189. ig EF Miscellaneous 1098091 1100823 Length 2733 bp Example genes Phg BapB Exons 1088091 1100823 Sphred length 2733 bp Graphical Display in Artemis B pertussis project page at the Sanger Institute Context Map B pertussis genome by FTP from the Sanger Institute Primary Annotation Identical te the previously sequenced Bordetella pertussis pertacun precursor Pm or Omp 3A BORFE P14283 210 aa fasta scores 0 10099 1d m 910 an and te Bordetella bronchurephica pertactn precursor Pm SW PEEKT BORBE 003025 911 an fasta scorer 1 4e 160 91 3194 d n 921 Predicted Peptide Properties Mass 93 4 kDa acids 910 Isoelectric point pH 10 0 Charge 17 0 127 Workshop on Bacterial Genomics DB Bordetella parapertussis GeneDB DE This page provides access to the annotation and sequence of parapertussis stram 12822 This is the result of a collaboration of the Sanger Institute with Duncan Maskell and Andrew Preston ofthe Centre for Veterinary Science Dept of Clinical Veterinary medicine The University of Cambridge This data described in Parkhill e a 2003 Comparative analysis of the genome sequences of Bordetella pertussis Bordetella parapertussis and Bordetella bronchiseptica Nature Genetics DOT 10 1038 Ng1227 PDF version of this article Database Entry Point Search for PR by ID description M Include description Searches Analysis Add wildcards omniBLA
190. in earlier You can simply navigate between gene pages by filling in the Search for box in the navigation bar Add them to the gene basket by simply clicking on the Add to Basket icon at the top of each of the gene pages 106 Workshop on Bacterial Genomics 6 DB CDS Tb09 160 3850 Please Note Over the Christmas period some feedback and GNC submissions were lost due to technical problems Weave used our log files to trace these and curated a list of them with appropriate dates If feedback you have sent is on this list could you please resubmit tt should it still be appropriate Search for Go To Organisms General Information Help Contact curator Go To f Shortcuts Name Systematic Name Tb09 160 3850 Warning Temporary systematic 19 Gene Synonyms 28G16 235 Prev Systematic Id chr9contig160 tmp0298 Status role inferred from homology Products actin related protein 5 putative Type CDS Sequence DNA and Protein Location Chromosome 9 Contig Location 853771 855021 Length 1251 bp Graphical Display in Artemis Context EE HG fe Tb09 160 3890 x 845000 850000 855000 I 1 CDS hypothetical protein conserved MSH3 Tb09 160 3770 Tb09 160 3780 09 160 3790 Tb09 160 3800 Tb09 160 3810 Tb09 160 5820 Tb09 160 3830 Tb09 160 3840 Tb09 160 3850 Tb09 160 3870 Primary Annotation likely role in F actin
191. in the form GeneDB_Spombe SPAC 1002 09 or you can just use the systematic ids eg Tb927 1 710 but in this case you must also set the default organism Default Organism T brucei Descriptions C DNA Unspliced sequence or cDNA C DNA Spliced sequence Protein sequence Intergenic Sequence 3 Number of bases 20 Intergenic Seq C ange 3 distance 0 7 5 distance 0 5 Orthologues T brucei Th s 160 3850 Teruzi Tenn 1047053508277 260 GeneDB Teruzi TennD 1047053503913 2z0 T brucei Tb327 8 4410 ceneDB Lmajor LmjFi0 1000 GeneDB Tcruzi TcUnD 1047053506565 10 GeneDB Tcruzi TceUD 104 705350 T brucei Th10 70 2680 GeneDB Terugi TcOO 1047053508625 30 GeneDB 00 1047053510963 70 T Tb3z27 2 z9 0 0 ceneDB Lmajor LmjF z 05600 GeneDB Tcruzi TcUnD 1047053503127 104 GeneDB Teruzi TcOD 104705351 T brucei Thi0 406 0320 GeneDB Terugi TcOO 1047053511635 30 Tbin 389 0z 7 0 ceneDB Lmajor LmjFiS8 0920 GeneDB Teruzi TeO00 1047053504215 40 GeneDB Teruzi TcUD 1043705351 T brucei Thi0 61 0500 GeneDB Terugi TcOO 1047053508899 110 GeneDB Terugi TcOO 1047053511361 40 12 108 Workshop on Bacterial Genomics T brucei T TP L infantum T vivax T b major coongolense gambiense Arp2 Tb10 61 0 500 Arp3 Tb09 160 3850 41 Tb10 380 0270 p34 Arc Tb927 8 4 PF04045 410 p21 Arc Tb10 70 2 PF
192. ing It also 37 shows that the full text search site wide is a powerful way of searching the annotation of all the genomes in GeneDB for possible orthologues 142 Workshop on Bacterial Genomics An alternative way of identifying potential orthologues 1 the presence of a protein domain that 15 associated with that function This approach also makes use ofthe Cross Organism search Page which allows browsing of Pfam and Interpro assignments across several genomes concurrently Let us assume that our previous searches had uncovered that the Pfam domain Sugar and other transporter PF00083 15 a Pfam domain associated with Hexose Transporters Note that in PF00083 PF stands for Pfam To view more details about this protein domain goto the gene page see under box 6 for reference and click on PF00083 which is in a red box in the figure under box 6 We want to search for proteins in Plasmodium falciparum Plasmodium chabaudi Plasmodium berghei that also have this domain Genel Search Page 2 Sa 4 rat ote e Pribasa Lal Gate Shoncus zj Search for inchide deseripelon Add Sean hi Names 1i Punt Fungi feet peat DU X cervi X pomir D DL mjor berghei hoba duciparum T bee T ona Havteria 7 B hoonchiseptica jusriperrtussis B pertuxsis
193. ing that could apply to the whole of GeneDB rather than the annotation of a particular gene or organism then it may be better addressed to technical feedback There are links on the bottom of each page in GeneDB see 1mage below The Wellcome Trust Sanger Institute Consortium Ihe Institute for Genome Research Stanford University Experimental resources Malaria Res Center MR4 Hosted by the Sanger Institute mum Curator feedback uud echnical feedback 150 Workshop on Bacterial Genomics References References Longden I and Bleasby A 2000 Trends in Genetics 16 6 276 277 EMBOSS The European Molecular Biology Open Software Suite Carver T J Mullan L J 2002 Comparative and Functional Genomics 3 1 75 78 A new graphical user interface to EMBOSS Rutherford et 1 2000 Bioinformatics 16 10 944 945 Artemis sequence visualization and annotation Carver T J Rutherford K M Berriman M Rajandream M A Barrel B G and Parkhill J 2005 Bioinformatics 21 16 3422 3423 ACT the Artemis comparison tool Hacker J Blum Oehler G Muhldorfer I and Tschape 1997 Pathogenicity islands of virulent bacteria structure function and impact on microbial evolution Mol Microbiol 23 1089 97 51 Workshop on Bacterial Genomics Appendices Appendices 152 Workshop on Bacterial Genomics Appendices Appendix I Artemis minimum hardware and software requirements Artemis and ACT wil
194. integrated documentation resource for protein families domains and sites Aims The aim of this module will be to explore these controlled vocabularies using a series of worked examples 61 Workshop on Bacterial Genomics Module 5 Genome Resources Section 1 Exercise 1 Part I 1 What do equilase and 2 C methyl D erythritol 4 phosphate cytidylyltransferase do What kinds of pathways are they involved in You probably won t have a very clear idea of what these enzymes are even if you re a biochemist Use their EC numbers EC 1 11 1 6 and EC 2 7 7 60 respectively to find out more from the official Enzyme Nomenclature website Go to this web address http www chem qmw ac uk iubmb enzyme in your web browser window 4 Enzyme Nomenclature Microsoft Internet Explorer Olas http chem grow ac uk iubmb enzume Nomenclature Committee of the International Union of Biochemistry and Molecular Biology NC IUBMB consultation with the Commission on Biochemical Nomenclature Enzyme Nomenclature Kecommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzyme Catalysed Reactions http www chem gmul ac uk ubmb enzyme World Wide Web version prepared by G F Moss Click here to Department of Chemistry Queen Mary University of London Mile End Road London
195. ion Searches Analysis Arpa omniBLAST Include description BLAST Add wildcards Motif Search EMOWSE AmiGO List Download Cross Organism Search Page Complex Boolean Query RNAit primer design Full Content Search Genome Browser Contig Chromosome Maps information I Data Example genes Data releases download Sanger ETP download TIGR Feedback Curator Technical Sanger brucei project TIGR 7 brucei project T congolense project T vivax project Help Biological resources The homepages also provide up to date information about sequencing progress data updates nomenclature and other community resources 92 News archive Trypanosomatids genomes and biology CD 15th July 2005 Tntryp genomes published in Science 22nd June 2005 non coding RNA annotation revised protein feature domam predictions updates systematic identifiers for chr 3 8 assigned by TIGR 2nd February 2005 updated 7 vivax data GeneDB T congolense and T b gambiense data available via GeneDB Workshop on Bacterial Genomics Previous Next Report Download T brucei CDS Ibl 70 2680 ARPAS complex subunit putative ARP2 S complex subunit putative TRYP x a 6 p2kb545 360 T brucei CDS IbS278441 0 AEPZ3 complex subunit putative Tbs 29H22 800 T brucei CDS Tb10 389 0270 actin related protein 2 3 complex putative 2 5 complex subunit putative T brucei
196. ious features in the order that they occur on the DNA with the selected gene highlighted The list can be scrolled 8 below Sliders for zooming view panels Sliders for scrolling along the DNA Slider for scrolling feature list Workshop on Bacterial Genomics Module 1 Artemis 4 Getting around in Artemis The 3 main ways of getting to where you want to be in Artemis are the Goto dropdown menu the Navigator and the Feature Selector The best method depends on what you re trying to do and knowing which one to use comes with practice 4 1 The Goto menu The functions on this menu ignore the Navigator for now are shortcuts for getting to locations within a selected feature or for jumping to the start or end of the DNA sequence This one s really intuitive so give it a try A Artemis Entry Edit S typhi dna Goto Edit Create Write Run Graph Display Noddy Selected feature base Navigator Ctrl G 0003 1 5 3 1 18 colour 7 ec ortholoque k Start of Selection Ctri Left End of Selection Ctrl Right STY 0002 RR be pese on I das HW Hn pL Ctrl Up ST Start of Sequence End of Sequence Ctrl Down gt gt misc feature isc feature m 800 Feature Base Position 000 4800 5600 6400 7200 Feature Amino Acid t ea misc f aen dno s ome nal gu mg EL
197. issimum which exhibit both catalase and peroxidase activity have sometimes been referred to as catalase peroxidase Links to other databases BRENDA EXPASY EEGG WIT CAS registry number 9001 05 2 Exercise 1 Part II 63 Workshop on Bacterial Genomics Module 5 Genome Resources The BRENDA database contains similar information to the IUBMB site BRENDA Entry of catalase EC Number 1 11 1 6 2 422 Elle Edit View Go Bookmarks Tools Window Q Q Q amp http www brenda uni koeln de php result flat php4 ecno 1 11 1 6 amp 0rganism Search lt 4i Home Bookmarks Google S Library S SSC S SSC dev S WIKI S PSORT Pfam S SRS7 S FTP E coli S GeneDB S Entrez PubMed mee 21 BRENDA History of your search The Comprehensive Enzyme Information System Entry of catalase EC Number 1 11 1 6 C number ecommended Name ystematic Name Any question Use the BRENDA Discussion qroups ynonyms Mark a special word or phrase in this record AS Registry Number All organism eaction Acinetobacter sp strain ADP1 eaction Type Ajellomyces capsulata Enzyme Ligand Interactions Anopheles gambiae ubstrate Product Select one or more organism in this record Arabidopsis thaliana 3 3 14 5 m 5 44 atural Substrates ofactor EC NUMBER COMMENTARY 1
198. it to _ BS misc feature 800324 Aspartokinase signature misc feature SEA dehydrog Orthologue of coli thrB KHSE ECOLI Fasta hit to KHS A GHMP EEE putative ATP binding domain 3734 5020 Orthologue of E coli thre THRC ECOLI Fasta hit to THRC E misc feature CDS misc feature 4022 4066 2800165 Serine threonine dehydratases yridoral phosphate at CDS 5114 5887 Orthologue of E coli YAAA Fasta hit to YAAA CDS 5966 7396 Similar to Bacillus subtilis amino acid carrier protein alst misc feature 7091 7138 e PS00873 Sodium alanine s 316 aa family signature CDS 7665 8618 Fasta hit to TALA ECOLI X 6 65 identity in 311 aa ove mica nenincdA 1 suggestions of where to go Think of a number between 1 and 4809037 and go to that base notice how the cursors on the horizontal sliders move with you Your favourite gene name it may not be there so you could try fts Use Goto Feature With This Qualifier value to search the contents of all qualifiers for a particular term For example using the word pseudogene will take you to the next feature with the word pseudogene in any of its qualifiers Note how repeated clicking of the Goto button takes you through the pseudogenes as they occur on the chromsome tRNA genes Type tRNA in the Goto Feature With This Key Regulator binding DNA
199. iting eGo through the gene model of Phat4 have a glance through the exon intron boundaries Can you suggest any alternative gene model after consulting the Table provided in Appendix IX containing several examples of experimentally verified splice site sequences for P falciparum Example modifications Have a look at the misc_feature coloured in red location 15618 20618 Can you spot any difference in the red gene model of Phat4 at the exon intron boundaries Select the red feature click on Edit menu and select Edit Selected Features and the new window that pops out change the from misc feature to CDS and click on OK button to close the window Now you can compare the automatically created blue gene model and the curated red gene models at protein level and predict any alternative splicing pattern File Entries Select View Goro Create Whita Graph Ox selected on formed akranmd 21557 Entry F pin gle EF pkn152 7410 euh Gombe Windew arza h AN hy ul A 4 i M Li AP 4 X WI T 4 W y ng ug Phnt4 Alterna ies bane o apl 15200 H 2 at EO 92 LEE E pd NIE
200. ium tumelacicn xir CS ewon plasmid AT 30 parti 542 869 ADHAT CON 547 FASTA SRS l b Agrobacterium tumefaciens str C58 Cereon plasmid Ti QO parts 214283 197 FASTA SRS grobage mam cien 12D CL plasmid AT 49 parmi 42 780 LUN FASTA PAS iak eee shana ass SP ter eee nee ee oe amas EDEN de 49 al 49 _ 217 594 azonoss 169 FASTA SRS Aigi a cherour turmelaciens Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST 7 Edt Go Bookmarks Toots 2 Reload Siop 7 Home uf Bookmarks af WebMail 46 Conne 137 Salmonella enterica subsp enterica serovar Hera plasmid pRERT 136 Salmonella enterica subdp enterica serovar Typhi CT18 plasmid pHCMI 188b Salmonella enterica subsp entesica serovar Typhi plasmid pHCM2 1085816 132 FASTAS Salmonella entertidis serovar Enteritidis plasmid pt Salmonella ententidis serovar Enteritidis plasmid Salmonella entertidis serovar Enteritidis plasmid pP Salmonella typhi R27 plasmid 14 Salmonella typhi plasmid R27 141 Salmonela tphimurium LTE strain SGSC1412 plasmid pst T Selenomonas num nartiuem MI plasmid Sclenomonas num nantiu
201. k Create then New Entry Another entry will appear on the entry line called you guessed it name We will eventually copy all our phage related genes into here 421 Workshop on Bacterial Genomics Module 1 Artemis Click Select the Feature Selector Entry all MITT WAIT Bases None MINE q By Key EM Make sure the buttons are down Same Key P Select View Goto Feature Selector 1 1 CDS Features Edit Create Write Run Graph Display OU Features i Base Rang Feature 4 set Key to CDS and Qualifier to colour oggle Se GCAGGACGCTGCACTG Type search term 10 TTTCGTCCTGCGACGTGA C 7 CS 0 Click to select features containing search term misc feature CDS CDS CDS CDS Click to view selected features Double click to 6 bring feature into main view window Open Rea 1 Ctrl A MILLI WEIN IE Ctrl N 5 4528 1 4525 5 4 Select by 6400 7200 Key CDS F Common Keys d l Qualifier colour S Ignore C 11 Partial Match SG TA DH N cS F TCCGGCTAACAGC
202. k Saveastype Text Documents bi My Documents Encoding v system32 Temp ytwain_32 VirtualEar Web setuplog Running BLAST The BLAST software does not run in Windows but DOS an operating system that Windows runs in When you want to run blast you will need a DOS window a k a Command Prompt Internet Co My Documents Internet Explorer 5 E mail 2 My Recent Documents gt Microsoft Outlook 7 My Pictures PowerPoint e My Music aJ ftp Explorer Favorites 42 Big ART My Network Places Word al Notepad ES Command Prompt 164 Appendices Workshop on Bacterial Genomics Appendices Type cd blast 1 Microsoft Windows Version 5 1 2600 Copyright 1985 26061 Microsoft Corp pum Press Return C2 Documents and Settings Team 1 gt _ This changes the directory to the blast folder which you have just down loaded and unpacked blast 2 2 6 1a32 win32 exe Command Prompt a x Microsoft Windows Version 65 1 2601 C Copyright 1985 2661 Microsoft Corp C 5 Documents and Settings TeamBitcd blast Now that that you are in the blast directory you can start to run BLAST from the command line Gisblast gt There are several programs in the BLAST package that you have now downloaded that can be used for sequence comparison For a detailed description of the use
203. ks in the table above to view a query result set or select two of the result sets and use one of the following buttons to combine them Union will create a result set that contains all the genes in either of the selected sets Intersect will create a result set that contains only the genes in both of the selected sets Subtract remove any genes in the second ie appearing lower in the list set from the first UNION or INTERSECT or SUBTRACT tne selected query results a new entry will appear at the end of the list The Wellcome Trust DEB Results 1 2 of 2 C Sanger Institute Pathogen Sequencing Unit Go To Search Simpie Help Query Genes for ryp intersect Proteins that contain Pfam domain transporter intersect Genes for ryp intersect Proteins predicted to have between 7 and 8 transmembrane domains 1 2 1 2 name id organism description 1 PGPA 27 4 4490 7 brucei multidrug resistance protein E P glycoprotein Manual annotation 2 T509 160 4600 T brucei ABC transporter putative anual annotation view this result set later or to download this data visit the history page Start a completely new complex query You would have been able to retrieve the same result by combining the two queries from the outset using AND By clicking onto AND on the initial T brucei query page you will get the option of executing multiple queries
204. l in general work well on any standard modern machine and with most common operating systems It 1s currently used on many different varieties of UNIX and Linux systems as well as Apple Macintosh and Microsoft Windows systems Note that the ability to run external programs such as BLAST and FASTA from within Artemis and ACT 1 available only on UNIX and Linux systems Minimum memory requirements for people working on whole genomes are approximately 128 megabytes for Artemis and 128 megabytes per genome for ACT Analysis of cosmid sized sequences can comfortably be achieved with less memory Appendix II ACT comparison files ACT supports three different comparison file formats 1 BLAST version 2 2 2 output The blastall command must be run with the m 8 flag which generates one line of information per HSP 2 MEGABLAST output ACT can also read the output of MEGABLAST which 15 part of the NCBI blast distribution 3 MSPcrunch output MSPcrunch is program for UNIX and GNU Linux systems which can post process BLAST version 1 output into an easier to read format ACT can only read MSPcrunch output with the d flag Here 15 an example of an ACT readable comparison file generated by MSPcrunch d 1399 97 00 940 2539 sequencel dna 1 1596 AF140550 seq 1033 93 00 9041 10501 sequencel dna 9420 10880 AF140550 seq 828 95 00 6823 7890 sequencel dna 7211 8276 AF140550 seq 773 94 00 2837 3841 sequencel dna 2338 3342 AF140550 seq The columns have
205. le catalogues exercise 4 4 using boolean querying tool exercise 4 5 Once you ve identified the autotransporters across the three species we re going to examine the genomic loci of one of these transporters a little closer using ACT exercise 4 6 120 Workshop on Bacterial Genomics B parapertussis and B bronchiseptica SphBl Novel Novel Novel Novel Serum resistance protein Pertactin BapA AidB Vag8 BapC Phg Novel SphB3 serine protease Novel SphB2 Novel Novel Novel Pseudogenes are underlined Bordetella parapertussis GeneDB Table 2 Autotransporters encoded in the genomes of B pertussis pertussis parapertussis B bronchiseptica 0419 0450 0452 0821 0916 0961 BB1366 BB1649 BB1864 BB2033 BB2246 BB2270 2301 BB2324 2741 2830 2941 3110 BB3111 3291 BB3292 Parkhill et al Nature Genetics 2003 35 32 40 DB ss to the annotation and sequence of parapertussis strain 12822 This is the result of a collaboration of the Sanger askell and Andrew Preston of the Centre for Veterinary Science Dept of Clinical Veterinary medicine The University of arkhill 2 a 2003 Comparative analysis of the genome sequences of Bordetella pertussis Bordetella parapertussis septica Nature Genetics DOL 10 1038 Ng1227 PDF version of this article Database Entry Point Searchfor gene by ID description a
206. le or save it to your computer Always ask before opening this type of file winoows My Documents My Computer amp File name blastz My Network Save as type Application lt Local Disk I amp amp C3 compaq C CPQAPPS Documents and Settings 91386 New Folder Program Files Cj TEMP Ci WINDOWS C WUTemp My Documents ja My Computer File name blastz 29 My Network Save as type Application 162 Workshop on Bacterial Genomics Appendices Ase ee eee TE wl My Documents qa My Computer File name My Network Save as type Edt Favorites Too Q O 2 s tmm C Address File and Folder Tasks Rename this fle up Merwe thes fle Cony Sis mes fie to Wed tu fie Odete this fie Other Places Qu locu Duk C Q My Documents My Computer I My Network Places blastz Application 5 27 FORMATDB Fie 7B IMP fie 1148 98 1 895 M 1 760 Applicaton Oeste Modfed 16 06 2002 14 30 15 06 2003 1431 21 04 2003 21 15 21 04 2003 21 15 21 04 2003 21 15 21 04 2003 21 15 21 04 2003 21 16 21 04 2003 21 16 2104 2003 21 16 21 04 2003 21 17 21 04 2003 2
207. le via Sequence Foetrieval System SRS at and the server at Human Draft Genome The completion of the human drait genome sequence was Announced and published in February 2001 Nature and Science Since the beginning of the Human Genome Propect the intemanional Human Genome Sequencing Consormum has been sebmittiog human draft sequence data bo the International Nucbeonide Sequence DDS JEMEL GenBank High throughput human sequences have been made available to the public immediately via the EMBL Database high thegeghout genome degssam while finished sequences Been included in the Human dromon HUM J amp honoring Table Genome MOT provides urdinished and finished human genome data samed by chromosome Addinonaly The Genome MOT presents the of number of lange dukanotic genome eequencing projects on The Word Wide Web The Tables ane updated dally and alto provide access to EMAL database entries Genome Annotation and Proteome Analysis The Enemi Genome Browser provides the best possible aumoenaric annotation graphical views and web searchable datasets dor a number of eukaryotic genomes including human mouse drosophila anopheles zebrafish with others to follow Proteame Analysis information on a Large number cf onganisms Is available from Genomes Pages Plasmid Mozilla Elle Go Bookmarks Tods Win
208. lp ID List Please enter your gene rna ids either Note that you can also download this list inc the sequences by e database cross references eg in the form GeneDB_Spombe SPAC1002 09c e or you can just use the systematic ids eg T5927 1 710 but in this case you must also set the default organism Default Organism pertussis GeneDB Bpertussis BP0216 GeneDB Bpertussis BP0529 GeneDB Bpertussis BP0758 GeneDB Bpertussis BPO760 GeneDB Bpertussis BP0761 GeneDB Bpertussis BP0762 GeneDB Bpertussis BP0763 GeneDB Bpertussis BP0874 GeneDB Bpertussis BPi054 GeneDB Bpertussis BP1110 GeneDB Bpertussis BP1112 GeneDB Bpertussis BP1200 GeneDB Bpertussis BP1201 GeneDB Bpertussis BP1251 GeneDB Bpertussis BP1344 GeneDB Bpertussis BP1610 GeneDB Bpertussis BP1660 GeneDB Bpertussis BP1767 GeneDB Bpertussis BP1793 GeneDB Bpertussis BP1884 clicking on the Report Download button Information Required For Each RNA CDS Descriptions C DNA Unspliced sequence or cDNA C DNA Spliced sequence C Protein sequence C Intergenic Sequence 3 m NE 0 a C Intergenic Sequence 5 C Sequence Range 3 distance 0 5 distance 0 C Orthologues Submit Query Reset 129 Workshop on Bacterial Genomics Table 2 Autotransporters encoded in the genomes of B pertussis B parapertussis and B bronchiseptica B pertussis parapertussis B bronchiseptica SphBl BP0216 BPP0417
209. lysin secretion protein cyaD pertussis CDS 0763 cyclolysin secretion protein B pertussis CDS 0874 vir repressed protein vir 18 pertussis EDS 1054 pertactin precursor prn pertussis CDS BP1110 serine protease sphB3 pertussis EDS BP1112 putative outer membrane ligand binding protein bipA pertussis CDS BP1200 autotransporter pseudogene bapB pertussis CDS BP1201 tracheal colonization factor precursor pertussis CDS BP1251 putative toxin pertussis CDS BP1344 autotransporter pertussis CDS BP1610 putative autotransporter Pseudogene pertussis CDS BP1660 autotransporter sphB2 pertussis CDS BP1767 autotransporter phg B pertussis EDS BP 1793 autotransporter Pseudogene pertussis CDS BP 1884 hemolysin activator like protein fhaC pertussis CDS BP2224 putative autotransporter bapA pertussis CDS BP2315 autotransporter vag8 pertussis CDS BP2468 Virulence protein vrg 6 pertussis CDS BP2627 autotransporter pseudogene B pertussis CDS BP2667 adhesin fhaS B pertussis CDS BP2738 autotransporter pseudogene bapC Out of interest have a look at the Pfam product browsable catalogues The Pfam domain of interest is Autotransporter beta domain PF03797 Can you identify autotransporters across the 3 genomes that way ene Ist Information oanger institute Pathogen Sequencing Unit Go To Organisms Go To Shortcuts He
210. m genome resource oe fiona fron The Genome Sequencung The Welkom Sanget The fx Comoe Rescarch Srani ond L nicersity E perinatal meroes Ihe Malaria Hcscarch amd Keleme Reece im Center MES T EE Gene Docs Ges Ontology GO Liaks CO Summary Ea 7 Gene ntology 2444 ORE 50 baelezical process 2084 Ee CHOEDOOS S 15 cellular component 211 2 2200003674 funstion 2134 lu Curt tis as XML HUI i G akk La Ti pug rrwrsnl 145 Workshop on Bacterial Genomics The results should be similar to those below how do the results compare to your previous searches Is this search successful in identifying the putative orthologues of PfHT in P chabaudi and P berghei Clicking on the gene name red arrow 1 will display the exact GO ontology annotation for that gene To go the GeneDB gene page click on the name of the database e g GeneDB Pfalciparum red arrow 2 The Term Lineage section shows that carbohydrate transporter term is a subset or child of transporter actvity Clicking on GO0005125 transporter activity arrow 3 will display all genes with transporter activity within P falciparum P berghei and P chabaudi The Associated Genes section can be used to apply the same search to other organism databases arrow 4 and als
211. m is 2 crm onn a A 1 a SOOO 1060000 Click with the left mouse button in a graph window A line and a number will appear The number 15 the relative position within the genome bps Click and drag to highlight a region 616600 924900 1233200 1541500 1849800 21568100 2466400 2774700 3083000 3391300 3699600 4007909 4316200 46245 on the main DNA line Notice that the boundaries of this D E ID CN UON a T ze e t i T ve e x T E T p D im T K S ni zx c N region should AGAGATTACGTCTGS TATATCGCCAGCAG TTTOGGAATG CACATGAACAAG 0 0 0 40 50 0 0 0 90 100 110 120 30 be marked in the Rs OT CE SPOR EE SOEUR CORR 52 4 graph windows that your previously clicked 15 Workshop on Bacterial Genomics Module 1 Artemis Artemis Exercise 1 Part III Ato
212. m plasmid pSRT M4 Shigella sonnet plasmid 5 3 FASTA SRS nam 5 n HPV I Look in Module 7 File name pHCM1 embl Files of type All Files Show hidden files and directories 50 Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST Artemis Entry Edit pHCM1 embl Nothing selected Entry pHCMi embl P repeat unit RBS LE MI insA a HCM1 02c per E 8 RR V INCHES 8 s 8 8 I yr R G L Y T T Ti am ini jl esr MI HELME Do 32002 4 lt HCM1 HCM1 15c S q HCM1 20c HCM1 2 Bg a GS lt e mail EN RF B C E A S V R D C GCAGTCGGAGGZ AGG CTA i i a a i i a an i i ae B K 8 E R I F S L P I Ek 1 218160 1 218160 T repeat unit repeat unit E CDS Select an output file name 1 01 possible membrane protein len 185 aa unknown func possible RBS 1 02 hypothetical protein HCM1 03e hypothetical protein possible RBS 13 bp inverted repeat flanking 181 181 1 1 103
213. ma L Mi ih y m P falciparum Pfal chr13 embl EET n Aal NM i um 18 oa 18200 ate Zu uid m Ul TET i T L ya n n nl japan 1 i y Y W Comparison between orthologous genes in P falciparum and P knowlesi 38 iir an ma F 32m P knowlesi Pknowlesi contig embl Workshop on Bacterial Genomics Module 2 Comparative Genomics Exercise 2 Part IV Gene models for multi exon genes in P falciparum Use File menu to select entry Pfal_chr13 embl and select Edit In Artemis to bring up an Artemis window In Artemis window use Graph menu and switch on the GC Content window Use Goto menu to select Navigator window and within the Navigator window select Goto Feature With This qualifier value and type 1010 7 click then close the dialogue box Go through the annotated gene model for PFM1010w and have a look at the the exon intron boundaries and compare with the splice site sequences from P falciparum given in Appendix IX Also have a glance through a few other gene models for multi exon genes and have a look at the intron sequences as well Can you find any common pattern in the putative intron sequences Hint look at the complexity of the sequence You can delete exon s of any
214. mbe CDS 5 pombe CDS 5 pombe CDS 5 pombe CDS SPBC14C8 06 5 pombe CDS SPACI7GS 04c 5 pombe CDS SPBC1778 08c SPACE 10c SPAC GS 07c Previous Hosted by the Sanger Institute SPACIIHI1 06 arc2 arc2 ARP2 3 actin organizing complex subunit Arc 34 4 ARP2 3 actin organizing complex subunit Arc4 obsolete arp 10 arp2 ARP2 3 actin organizing complex subunit Arp2 SPAC22F8 01 arcl ARP2 3 actin organizing complex subunit Sop2 sop2 5 ARP2 3 actin organizing complex subunit Arc16 arc16 arc3 arc3 ARP 2 3 actin organzing complex subunit Arc21 89 2 The Wellcome Trust lt Sanger Institute 7 Pathogen Sequencing Unit 3 Help Next Report Download 34 21 Next Send comments requests corrections and updates Workshop on Bacterial Genomics 5 Navigation bar pull down menues You can navigate between different organism datasets and search tools using pull down menus DB CDS arc4 WARNING 20th July 2005 There is currently bug with the downloading of intergenic sequences fro ist Report Download f partial 225 are included This feature will work correctly if the partial CDS sequences SPAC977 01 SPAC750 21 7 10 are not included your download list Search for Shortcuts Contact curator Biological process cellular component and molecular function annotation 15 being removed from the description part gene pa
215. mes move together This is because they are locked together Right click over the middle comparison view panel A small menu will appear select Unlock sequences and then scroll one of the horizontal sliders Notice that LOCKED has disappeared from the comparison view panel and the genomes will now move independentl i Lb x File Entries Select View Goto Edit Creste Write Bun Graph SS Eseni 10929000 1274700 2924500 3142400 3674200 199200 4724100 n 17 View Selected Matches Flip Subject Sequence Flip Query Sequence Lock Sequences Unlock Sequences Set Score CutofFs Set Percent ID Cutoffs LOCKED Offer To RevComp parai 20400 15 7010 ian 2 11494010 3524300 prosti You can optimise your image by either removing low scoring or percentage ID hits from view as shown below 1 3 or by using the slider on the the comparison view panel 4 The slider allows you to filter the regions of similarity based on the length of sequence over which the similarity occurs sometimes described as the footprint ox File Entries Select View Goto Edit Create Write Run Graph Display 524900 1049800 1574700 2099600 2624500 3149400 3674300 4199200 4724100 Right button click in the Comparison View panel d c EN wie
216. mis Entry stna igx File Entries Select View Coto Fdie Create Write Run Displog Entryi FB typhi dna typhi tat ec content 56 64 M m NA Ti M i P Pi v im V Phe vg pt id LUN NI pos Pt PAN y 1 3 28 EE TM m JA 4 AA a WC MAPA A wi JN A wu A adu NI F edd Why Pp A Jj Karlin Signature Difference Window size 20000 v A uM MIA Third region to investigate First region 1233200 1541500 1849800 2158100 2466400 2774700 3083000 3391300 369960 4007900 4316200 46245 to investigate Second region to investigate V 9 E cok Y Q l B RK WD B TZ RDHNR GG N L copo ENN CACATGAACAMGTTTOSGAATG IGATCAATITAAAAATITATTGACTIAGGCGGGCAGATACTTTAACCE TeretaarozasaccaacsTrotetastattere coca Taactaact rarrratanksosst0sT05T6AcTNG PECAAASCCTTACACTAGTIARATITTTAAATAAC saarctg UE T PR TS KP PCTS 8G I MEX CAM ELLE EF n EE S uda NM TE r k V 5
217. nce can be searched against any genome in GeneDB using omniBLAST In our case we would want to search the three Plasmodium genomes 2 Another powerful approach makes use of the fact that many protein domains that are diagnostic of a particular function have already been characterised and assigned to many genes within the database Thus if we know that our gene of interest has a particular Pfam or Interpro domain then we can browse through the Pfam or Interpro catalogue for genes which have this domain This can be done concurrently for several organisms using the Cross Organism search page DB Search Pag L Shoncun Searching Hs Yama td T iesriptian Search for named description Add wildcards Saamh NamestOs Reset Fungi fumi gatus E Cervia 5 z D iinei FOL mijer obenghei cmg Hacteria bronchiwgsticu pasriperiussis pert D X nhi Vectors Full brxt earch Full teat search 71 be ho eriei the search Quotation marks may be used io groep words in a search mio a phrase eg ribosomal priem Lae the operator bo punt pao SAM eg DN m meras 4 t Browsing Hs Productsdlkescrigaism Browse through t
218. nd MW2 are MRSA however N315 1s associated with disease in hospitals and MW2 causes disease in the community and is more invasive Scroll rightward in both genomes to find the first large region of difference Examine the annotation for the genes in these regions What are the encoded functions associated with these regions What significance does this have for the evolution of methicillin resistance in these two S aureus strains from clinically distinct origins 59 Workshop on Bacterial Genomics Module 4 Jemboss Workshop on Bacterial Genomics Module 5 Genome Resources Module 5 Internet Genome Resources Introduction The preceding modules are concerned with predicting genes and then trying to evaluate what they do This module will deal firstly with some of the main ways that gene products are described using controlled vocabularies and secondly how you can use these description to quickly access genes from databases The module is split into three sections Section 1 EC numbers a very widely used system for describing enzymes EC numbers can be used to find out additional information for an enzyme such as possible orthologues the biochemical pathway that it s involved in etc or can be used to identify new enzymes Section 2 Gene Ontology a Way to find genes based on descriptions of the molecular function biological process or cellular component of their products Section 3 InterPro amp UniProt An
219. nd to check the present working directory the cd command to change directories and the Is command will list the contents of the present working directory 52 Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST You will treat pHCM1 dna as the database sequence and R27 dna as the query sequence View Terminal Go Help formatdb i pHCM1 dna p Now you can run the Blast on the two plasmid sequences The program that you are going to use is blastall In addition to the standard command line inputs we have to add an additional flag m 8 to the command line so that the Blast output can be read by ACT This specifies that the output of Blast 1s 1 one line per entry format see appendix II tblastx could be substituted here if a translated DNA translated DNA comparison was required en al File Edit N Terminal Go Help 5 blastall p blastn m 8 d pHCM1 dna i R27 dna o pHCM1 vs R27 Workshop on Bacterial Genomics Module 3 Generating ACT comparison files using BLAST The pHCMI vs R27 comparison file can now be read into ACT along with the pHCM1 embl and R27 embl or pHCM1 dna and R27 dna sequence files v pHCM1 embl vs pR27 embl A File Entries Select View Goto Edit Create Write Run Graph Display b D Bb 010 bbbbbb Wb gt Db gt B gt p
220. npublished with SPTR P33204 4 others Cellular Component actin cortical patch 155 GOCunpublished with SPTR P 4 41 others Arp2 3 protein complex 155 unpublished with SPTR P 33204 othe Published Expression Profiles Gene Expression Viewer Cell Cycle Meiosis Environmental Stress GO associations Links will take you to the Phenotype e e m descriptions of the terms as well as other Literature proteins annotated to the same ontology node Search for arc4 in PubMed Domain Information View Pfam domain structure for this gene product View SCOP superfamily DB Aces Description Pfam PF05856 ARP2 3 complex 20 kDa subunit ARPC4 Database cross references to literature InterPro 008384 ARP23 complex 20 kDa subunit 5 Ra phenotype protein motif domain as well as DB Aces Description sequence datab ases UniProt 092352 Probable ARP2 3 complex 20 kDa subunit p20 ARC EMBL 281317 S pombe chromosome I cosmid c6G9 EMBL 010050 Schiosaccharomyces pombe mRNA for 20 kd actin related protein complex partial cds GermOnlne 6 6 9 07 GermOnline PIR 139069 PIR PIR 143309 PIR PombePD 07 Proteome Inc SWISS PROT Annotation For This Protem Similarity Belongs to the ARPC4 family Function Part of a complex implicated in the control of actin polymerization in cells By similarity Sub Unit Belongs to a complex composed of ARP2 ARP3 P41 ARC P34 ARC P21 ARC P20 ARC and P16 ARC By similari
221. nson files using BLAST Exercise 1 Exercise 2 Exercise 3 Exercise 4 Module 4 emboss Exercise 1 Exercise 2 Module 5 Genome Resources Section 1 Exercise 1 Exercise 2 Exercise 3 Section 2 Exercise 1 Exercise 2 Section 3 Module 6 Data Mining using GeneDB References Appendices Workshop on Bacterial Genomics Module 1 Artemis Module 1 Artemis Introduction Artemis Rutherford et al 2000 15 a DNA viewer program written by Kim Rutherford and used for both Prokaryotic and Eukaryotic annotations It allows the user to get away from the relatively faceless EMBL and Genbank style database files and view the sequence in a graphical and highly interactive format Artemis 1s designed to present multiple lines of information within a single context This manifests itself as being able to zoom in to look for fine DNA motifs as well as being able to zoom out and bring into view operons several kilobases of a genome or in fact to view an entire genome in one screen It is also possible to perform quite sophisticated analyses and store the output within the Artemis environment to be accessed later Aims The aim of this Module 15 for you to become familiar with the basic functioning of Artemis by using a series of worked examples These examples are designed to take you through the most immediately useful functions However there will be time and encouragement for you to explore other menus nooks and crannies of Artemis that are
222. nt 3 to 5 DNA helicase GO 0043140 F Catalysis ofthe Consider also E activity reaction ATP annotating to H20 ADP the molecular phosphate function term driving the binding unwinding of the GO 0005524 DNA helix in the direction 3 to ATP dependent 5 to 3 DNA helicase 0 0043141 F Catalysis ofthe Consider also E activity reaction ATP annotating to H20 ADP the molecular phosphate function term driving the ATP binding unwinding of the 0005524 DNA helix in the direction 5 to 3 Show checked items in tree Check Uncheck All _ Submit Query Help GOst The Gene Ontology GO Request AmiGO Request Last updated 2005 09 06 Copyright The Gene Ontology Consortium ha OF EJ E gf 7 76 Workshop on Bacterial Genomics Module 5 Genome Resources 2 AmiGO Your friend the Gene Ontology Mozilla Eile Edit View Go Bookmarks Tools Window Help Term GO ID im Ontology Definition Comment ynonym DNA helicase x Submit Query as JE T vokanium V cholerae M Ensembl TIGR_CMR All Curator Approver Query Summary Your Query DNA helicase Exact Match no Target Terms Fields Name and Synonyms Results 9 ATP dependent DNA helicase activity DNA helicase activity DNA dependent DNA helicase activity single stranded ATP dependent single stranded ATP dependent complex MURES 3 to 5 DNA helicase a
223. nucleation as part of complex with ARP2 curated by B Wickstead Univ of Oxford Predicted Peptide Properties Mass 47 3 kDa Amino acids 416 Isoelectric point pH 6 4 Charge 1 0 Signal Peptide Not found m 1 T 4 The Wellcome Trust DB Gene Basket c Sanger Institute coo DB gt Pathogen Sequencing Unit m Go To Organisms Go To Shortcuts Help 8 All results shown Report Download T brucei CDS Tb 10 70 2680 2 3 complex subunit putative ARP2 3 complex subunit putative TRYP_x 70a06 p2kb545_360 T brucei CDS Tb927 2 2900 ARP2 5 complex subunit putative 10C8 250 T brucei CDS Tb927 8 4410 ARP2 3 complex subunit putative Tb08 29H22 800 T brucei CDS Tb09 160 3850 actin like protein 3 putative 28G16 235 chr9contig160 tmp0298 obsolete Chr9 tmp 121c 70 obsolete chr9 contig3519 tmp 166c T brucei CDS _ 10 389 0270 actin related protein 2 3 complex putative ARP2 3 complex subunit putative T brucei CDS Tb10 61 0500 actin like protein 2 putative TRYP x 61h03 q2kb185 78 T brucei CDS Tb10 406 0320 2 3 complex 16kDa subunit putative ARP2 3 complex subunit putative Hosted by the Sanger Institute Send comments requests corrections and updates 107 Workshop on Bacterial Genomics 9 GoTo Organisms Go To Shortcuts E ID List Please enter your gene rna ids either as database cross references eg
224. o to filter the results by evidence code arrow 5 The evidence code provides information on the type of data that was used to apply a particular GO term to that gene ISS is Inferred from Sequence or Structural similarity and 15 used when similarites such as BLAST hits the presence of protein domains or other features based on sequence or structural similarity IEA 1s inferred from Electronic Annotation and is used when similarities have been transferred from automated annotation and have not been reviewed by a curator For a more detailed description of evidence codes click on the evidence link arrow 6 If the evidence code has a link this will provide more information about the evidence for the GO term arrow 7 RDE XMI Ami lt Get this data as a GO flat file carbohydrate transporter activity A ccession GO 0015144 Synonyms sugar transporter Definition Enables the directed movement of carbohydrate into out of within between cells GO 0003673 Gene Ontology 61 13 5 GO 0003674 molecul DGO O005S2 15 tranor function 5354 b GO 0015144 carbohydrate transporter activity 10 C External References 2GO b ClAssocinted Genes Filters Filter by database Al mo 4 SGD Filter by Evidence for Aswociation a Page 1 Curator Approved 5 Inferred from Mutant Phenotype Filter Associated Genes GO Term Gene Symbol Datasource Evidence Full name 2050015 145 monosacchande transp
225. oing to take the protein sequences of one of these genes and search for similar protein sequences in the genomes of P falciparum P berghei and P chabaudi using Omniblast This uses the wublastp algorithm The example below will take you through this process for one protein but if you have time try one or two others To start with use the sequence of a protein from T annulata TA02485 as this as been annotated as a hexose transporter homologue 1 homologue which is the name of the gene in P falciparum as it appears in the literature and so it is likely that it is an orthologue of the protein in P falciparum _ _ wr e Ww m l rus c Sanger Institute DE Gene Results List S zanger uos Go To Organisms Go To Shortcuts Help All of 37 results shown T brucei CDS Tb 10 6k 15 2040 hexose transporter TR Y Ptp k 15 3d06 p 1c_304 embl 261 S cerevisiae CDS YDRS36W STL 1 hexose transporter putative T annulata CDS IA0248 4 hexose transporter homologue putative Tap140g05 q1c 02 T annulata CDS LAO2480 hexose transporter homologue putative Tap 140 05 q1c 01 a T annulata CDS TA16160 hexose transporter putative Tap821d03 p 1c 104c _ 2 CDS TA02485 cene DB amp a ta zi Comal General Information m neuer mm Ree an urne TAGES domed Tr edd ed hence
226. on motive force 3 5 5 Glycolysis 3 5 6 Oxidative branch pentose pathway 3 5 7 Pyruvate dehydrogenase 3 5 8 TCA cycle 3 6 1 Fatty acid and phosphatidic acid biosynthesis 3 7 0 Nucleotide biosynthesis 3 7 1 Purine ribonucleotide biosynthesis 4 0 0 Cell envelop 4 1 0 Periplasmic exported lipoproteins 4 1 1 Inner membrane 4 1 2 Murein sacculus peptidoglycan 4 2 0 Ribosome constituents 4 2 1 Ribosomal and stable RNAs 3 7 2 Pyrimidine ribonucleotide biosynthesis 4 1 3 Outer membrane constituents 4 1 4 Surface polysaccharides amp antigens 4 1 5 Surface structures 4 2 3 Ribosomes maturation and modification 4 2 2 Ribosomal proteins synthesis modification 5 0 0 Extrachromosomal 5 1 0 Laterally acquired elements 5 1 1 Colicin related functions 5 1 2 Phage related functions and prophages 6 0 0 Global functions 6 1 1 Global regulatory functions 7 0 0 Not classified included putative assignments 7 1 1 DNA sites no gene product 7 2 1 Cryptic genes 5 1 3 Plasmid related functions 5 1 4 Transposon related functions 158 Workshop on Bacterial Genomics Appendices Appendix VII List of colour codes 0 white Pathogenicity Adaptation Chaperones 1 dark grey energy metabolism glycolysis electron transport etc 2 red Information transfer transcription translation DNA RNA modification 3 dark green Surface IM OM secreted surface structures 4 dark blue Stable RNA 5 Sky blue Deg
227. orter activity 1 0210 GeneDB Pfalciparun 4 2 monosaccharide transporter putative 6 PB0005620L0 GeneDB Pherghe IEA unpublished Not Available PC000736 00 0 GeneDB Pchabaudi unpublished Not Available GO 0005459 UDP galactose transporter activity 11 0141 GeneDB Pfalciparum ISS UDP galactose transporter putative PB001660 02 0 GeneDB Pberghei IEA unpublishec PC0007220L0 Pchabaudi unpublished Not Available 7 GO 000535 sugar porter activity PFE 1455w GeneDB Pfalciparum ISS sugar transporter putative GO 0008524 glucose 6 phosphate phosphate antiporter activity PFEO4 10w GeneDB Pfalciparum ISS or hexose phosphate phosphate translocator putative PB000956 00 0 GeneDB Pberghei IEA unpublished Not Available PCO00805 0L0 GeneDB Pchabaudi unpublished Not Available Previous Page Next Page Page All Check Uncheck Get Detailed View Hopefully these exercises have familiarised you with several strategies for data mining in GeneDB and given you ideas how GeneDB could be applied to your own research area If you have any further questions please ask a demonstrator or after the course please address your queries to the GeneDB team who will be happy to help you See box 60 and the figure below for details of email links 146 Workshop on Bacterial Genomics Exercise 5 Use of the Artemis Applet We are now going to look at the
228. othetical cis 815823 824387 putative erythrocyte binding protein Ebl 1 05 825195 826939 conserved hypothetical protein conserved in P falciparum CDS 828419 829969 c hypothetical protein CDS 832642 836418 hypothetical protein CDS 837288 838416 c hypothetical protein CDS 839671 841554 conserved hypothetical protein CDS 841956 844157 c hypothetical protei CDS 844779 845858 c hypothetical protein 846747 848219 hypothetical protein cis 850198 851491 small gtpase R CDS 852770 857098 hypothetical protein CDS 858040 859571 di TE succinyltransferase putative CDS 860389 861052 c ophilin putative SS Pa 1 4 1 Am am m dm m t amn Example location 789034 793351 in Pfal chr13 embl 39 Workshop on Bacterial Genomics Exercise 2 Part V Curation of gene models in P knowlesi We are now going to edit the gene model for P knowlesi Module 2 Comparative Genomics Use File menu from the ACT displaying P falciparum and P knowlesi to select entry Pknowlesi_contig embl and select Edit In Artemis to bring up an Artemis window Within the Artemis window use Graph menu and switch the GC Content 96 window Use Goto menu to select Navigator window and within the Navigator window select Goto Feature With This Text and type Phat4 Go to the first ACT window and use the Options menu to select Enable Direct Ed
229. ou should see the above The small white boxes are the regions of atypical DNA covering regions that we looked at in the first Artemis exercise It is apparent that there 15 a backbone sequence shared with coli K12 Into this various chunks of DNA specific the S typhi with respect to E coli K12 have been inserted 5 More things to try out in ACT Double click red boxes to centralise them Zoom right in to view the base pairs and amino acids of each sequence Load annotation files into the sequence view panels You could load in the appropriate tab files for each genome S typhi tab and 12 tab and view the annotation of a particular region Also try using some of the other Artemis features eg graphs etc Find an inversion in one genome relative to the other then flip one of the sequences Workshop on Bacterial Genomics Module 2 Comparative Genomics 6 Exercise 2 Part I Plasmoidum falciparum and Plasmodium knowlesi Genome Comparison Introduction The parasite P falciparum 1s responsible for hundreds of millions of cases of malaria and causes over 1 million deaths every year Treatment and control have become difficult with the spread of drug resistant malaria strains across the endemic countries in the world and there has been a major emphasis on research as part of our search for new drugs vaccine candidates to fight against malaria The analysis of the whole genome of P falciparum has been completed and
230. our searches a identify keywords that describe your topic b identify any synonyms for your keywords c be aware of spelling variations and or plurals d decide the scope of your search e be aware that using the same search method different databases may affect your results f try different search methods to identify candidate genes g be aware of the use of wildcards Exercise 1 Data mining the T brucei genome for the Arp2 3 complex Exercise 2 Using the Artemis applet to retrieve sequence and annotated features Exercise 3 Demonstration of the Boolean querying tool Exercise 4 Data mining of Plasmodium genomes for monosaccharide transporters Exercise 5 Data mining three Bordetella genomes for autotransporter genes 87 Workshop on Bacterial Genomics Exercise 1 Data mining of the T brucei genome for the Arp2 3 complex Can you identify the components of the Arp2 3 complex in the kinetoplastid organism Trypanosoma brucei causative agent of sleeping sickness in sub Saharan Africa The Arp2 3 complex 15 involved in actin assembly and function in the eukaryotic cytoskeleton So far this complex has not been investigated in kinetoplastids but has been well characterised in other organisms such as the fission yeast Schizosaccharomyces pombe Unlike the S pombe genome which 1 complete and contains extensive curated annotations the genomes of the trypanosomatids are in various stages of completion and annot
231. prokaryotic and eukaryotic that have been sequenced by the Sanger Institute Pathogen Sequencing Unit Note as this site 15 under continual development please be patient if things do not appear to be working let us know and it will be fixed Please also note that the URLs web page addresses may change so be cautious about creati the site please contact the developers DB Trypanosoma brucei GeneDB links to them If you contact the developers we will try and keep you informed of any changes If you have any suggestions about DB WARNING 5th September 2005 The Full Content Search is still pointing at the previous data run s data This will be fixed later in the day GeneDB contains release 4 of the T brucei genome strain TREU927 4 GUTat10 1 generated by the 7 brucei projects at The Institute for Genomic Research TIGR s 7 brucei project and The Wellcome Trust Sanger Institute Sanger s 7 brucei project It also contains the sequence and annotation of 3 T brucei strain 427 variant 221a bloodstream expression sites PMID Click here for more information on the 7 brucei genome proteome and here to find out more about the individual chromosome assemblies in Browse Catalogues Products SWISS PROT Keywords Pfam InterPro 3 particular with regards to additional unordered contigs from chrTX XI as well as 3 BACs from homologous regions of chrV VII and Database Entry Point Search for gene by ID descript
232. r INTERSECT Jor SUBTRACT the selected query results a new enia appear at the end of the list 137 Workshop on Bacterial Genomics In the initial download window the description of the genes in the results list will appear The lower part of the window boxed is used to select what type of information you want to download These options include the DNA sequence with introns unspliced or without introns spliced the protein sequence and either 5 or 3 regions flanking the gene to a chosen number of bases This would be very useful when examining regulatory elements such as promoter regions or UTR untranslated region 27 WW ne bh me Tus DE GeneDB Id List Information lt Sanger Institute 2 Pathogen Sequencing GoTo Organisms Go To Shortcuts Feb 11th 2004 Please note the ID list download and the shopping basket are brand new They were introduced early because of a hardware failure and haven t had the usual levels of testing so apologies for any usability rough edges or error messages The data that comes out has been tested and is accurate ID List Please enter your gene ma ids either as database cross references eg in the form GeneDB_Spombe SPAC 1002 09 or you can just use the systematic ids eg Tb927 1 710 but in this case you must also set the default organism Default Organism None Pfalciparum PFB0210c Pchabaudi PC000722 01 0 Pfalciparum
233. ra crassa Aspergillus nudulans and Podospora anserina Although these fungi belong to the same fungal taxonomic family Ascomycetes they vary greatly their biological characteristics In this exercise you will be studying and comparing the organisation of qut gene cluster among these 4 fungi using ACT Aim By looking at a comparison of the annotated sequences of N crassa fumigatus and A nidulans you will be able to first add annotations to qut cluster genes in P anserina sequence and second compare those genes that are found in all 4 organisms as well as spot the differences and study the synteny The files that you are going to need are 1 N crassa qut embl sequence amp annotated file for N crassa 2 A fum qut embl sequence amp annotation file for A fumigatus 3 A nid qut embl sequence amp annotation file for A nidulans artificially joined contig 4 P anserina qut embl sequence amp gene model file for P anserina without annotation 5 A fum crassa comp tblastx comparison file of A fumigatus amp N crassa 6 A fum A nid comp tblastx comparison file of A fumigatus amp A nidulans 7 A nid P anserina comp tblastx comparison file of A nidulans amp P anserina 8 anserina N crassa comp tblastx comparison file of P anserina amp N crassa First open an ACT window and then open the annotation and the appropriate comparison files in the order of 1 5 2 6 3 7 4 8 1 the
234. radation of large molecules 6 dark pink Degradation of small molecules 7 yellow Central intermediary miscellaneous metabolism S light green Unknown 9 light blue Regulators 10 orange Conserved hypo 11 brown Pseudogenes and partial genes remnants 12 light pink Phage IS elements light grey Some misc information e g Prosite but no function Appendix VIII List of degenerate nucleotide value IUB Base Codes R AorG S GorC B C GorT Y CorT W AorT D A GorT K GorT C G or H A Cor M AorC V A CorG 159 Workshop on Bacterial Genomics Appendices Appendix IX Downloading and installing BLAST on a Windows PC The following pages describe downloading BLAST onto a computer running Windows XP Downloading onto computers with other versions of Windows should be essentially the same but the windows will look different to the screen shots used here File Edit View Favorites BY Ae 2 5 1 BN FTP site PubMed Entrez BLAST jooks TaxBrowser Structure Address 8 http www ncbi nim nih gov Ftp index html Search Nucleotide for NCBI Guide to NCBI resources The science behind our resources An introduction for researchers educators and the public sequence submission support and software File Edit View Favorites Tools Help Major resources available by ftp ftp ncbi nih gov
235. rary S SSC S SSC dev S WIKI S PSORT S Pfam S SRS7 FTP S E coli S GeneDB Entrez PubMed e SiteMap SearchExPASy Comactus ENZYME SwissProt Search Swiss Prot TrEMBL Clear il ExPASy Home page NiceZyme View of ENZYME EC 1 11 1 6 Catalase Reaction catalysed 2 H 2 0 2 lt gt O 2 2 H 2 O Cofactor s Heme Manganese e This enzyme can also act as an EC 1 11 1 7 for which several organic substances especially ethanol can act as a hydrogen donor e A manganese protein containing Mn 3 in the resting state which also belongs here is often called pseudocatalase e Enzymes from some microorganisms such as Penicillium simplicissimum which exhibit both catalase and peroxidase activity have sometimes been referred to as catalase peroxidase Human Genetic Disease s Acatalasia MIM 115500 Cross references Biochemical Pathways map number s K5 PROSITE a eee This link takes you to a 11116 digital version of Roche PUMA2 1 11 15 M Applied Science PRIAM enzyme specific profiles 11116 pi mi th ee Kyoto University LIGAND chemical database 1 11 1 6 ocne Ways IUBMB Enzyme Nomenclature 1 11 1 6 wall chart IntEnz MEDLINE Find literature relating to 1 11 1 6 P83657 CAT1_COMTR Q9C168 1 NEUCR CAT1_PENJA 08 182 2 NEUCR 09 169 CATS NEUCR
236. rc P21 ARC ARP2 3 complex 21 kDa subunit ARPCA ARP2 3 complex 20 kDa subunit amp RPC4 P18 Arc ARP2 3 complex 16 kDa subunit p16 Arc AARP2ZCN AARP2CM NUC121 domain Tropomodulin Tropomodulin Actin 03 Workshop on Bacterial Genomics DB Trypanosoma brucei GeneDB WARNING Sth September 2005 The Full Content Search 15 still pointing at the previous data run s data This will be fixed later in the day GeneDB contains release 4 of the T brucei genome strain TREU927 4 GUTat10 1 generated by the 7 brucei projects at The Institute for Genomic Research TIGR s 7 brucei project and The Wellcome Trust Sanger Institute Sanger s 7 brucei project It also contains the sequence and annotation of 3 T brucei strain 427 variant 221a bloodstream expression sites PMID Click here for more information on the 7 brucei genome proteome and here to find out more about the individual chromosome assemblies in particular with regards to additional unordered contigs from chrTX XI as well as 3 BACs from homologous regions of chrV VII and Database Entry Point Search for gene ID description Searches Analysis HS 2 5 Browse Catalogues 7 Include description BLAST dir ad Lis Add wildcards Motif Search Pa hence om MR e OWSE HT ords 4 aes Genome Browser Cross Organism Search Page Full Content Search Complex Boolean Query RNAit primer design EE CN
237. rch 05 Pires Cempancnt Molecular Function Products po News Archive arch gt Ihe W me jJ rus lt Sanger Institute Pathogen Sequencing Unit Help This page allows you to build more complex queries against the database using preset selection of query forms and the boolean operators AND OR For example to construct a simple query begin by clicking oa the AND button Then use the pulldown menus to select two different queries Finally click oa proceed to next to generate a page that will allow you to parameters for the two queries Running the resulting boolean query will retum only those objects that satisfy query AND query 2 Please note you must select a query from each pulldown menu before the proceed will work If you need to back up a use the browser s back button You are currently searching P falciparum ll Choose a boolean condition or select a query None y Proceed to next step Choose a different organism DB GeneDB Boolean Search Page GoTo Gente xj Search Simple kal Please choose which s you wrih do search Fungi fumiguhus cerevisiae pombe Protozon nin 5 T F 1 Bacteria B bronchissitica B parapertuxsis B pertuxsis is Ve morfina Hosted by
238. recketed atemat BLAST menetasan Auera Aenstanan To view this reset set Later ar to combine with visit the page Sart completely new complex query GoTo Please choose which organism s you wish to search Fungi gt The Wellcome Trust GeneDB Boolean Search Page Sanger Institute oo Pathogen Sequencing Unit Search Simple Help fumigatus Protozoa D D discoideum L major D knowlesi 2 cruzi Parasite Helminths 5 mansoni Bacteria O 5 bronchiseptica pseudomallei S aureus MRSA Parasite Vectors CX morsitans Viruses EEA huxleyi virus 86 amp cerevisiae S pombe histolytica tenella D L infantium berghei chabaudi D 2 falciparum D T annulata 7T brucei congolense 7T gambiense T vivax B fragillis parapertussis pertussis abortus C diphtheriae I E carotovora S aureus MSSA S coelicolor S 116 Workshop on Bacterial Genomics The Wellcome Trust DB Sanger Institute gt Pathogen Sequencing Unit Go To Search 5 Help This page allows you to build more complex queries against the database using a preset selection of query forms and the boolean operators AND and OR For example to construct a simple query begin by clicking on the AND b
239. reductase EC 5 4 2 8 phosphomannomutase mannose phosphomutase phosphomannose mutase D mannose 1 6 phosphomutase map00052 Galactose metabolism BC 2 7 1 1 hexokinase hexokinase type IV glucokinase hexokinase D hexokinase type IV hexokinase phosphorylating ATP dependent map00500 Starch and sucrose metabolism BOY x1 hexokinase hexokinase type IV glucokinase hexokinase D hexokinase type IV hexokinase phosphorylating ATP dependent map00521 Streptomycin biosynthesis BC 2 27 LT hexokinase hexokinase type IV glucokinase hexokinase D hexokinase type IV hexokinase phosphorylating ATP dependent map00530 Aminosugars metabolism hexokinase hexokinase type IV glucokinase hexokinase D hexokinase type IV hexokinase phosphorylating ATP dependent ee Workshop on Bacterial Genomics Module 5 Genome Resources 2 Fructose and mannose metabolism Reference pathway Mozilla a Eile Edit View Go Bookmarks Tools Window Help Q Q Q http www genome jp kegg bin mark pathway www 20214 map00051 args 9 ex 4 4 amp Home Bookmarks S Google S Library S SSC 5 SSC dev WIKI S PSORT S Pfam S SRS7 S E coli S GeneDB lt Entrez PubMed Fructose and mannose metabolism Reference pathway Help Pathway menu Ortholog table Reference pathway Go Current selection Select L Sorbose FRUC AND MANNOSE METABOLISM D Mannitol 1P 3 1 3 22 2 7 1 69
240. rotein ACP 3 2 09 Molybdopterin 3 2 02 Biotin 3 2 10 Pantothenate 3 2 03 Cobalamin 3 2 11 Pyridine nucleotide 3 2 04 Enterochelin 3 2 12 Pyridoxine 3 2 05 Folic acid 3 2 13 Riboflavin 3 2 06 Heme porphyrin 3 2 14 Thiamin 3 2 07 Lipoate 3 2 15 Thioredoxin glutaredoxin glutathione 3 2 08 Menaquinone ubiquinone 3 2 16 biotin carboxyl carrier protein BCCP 157 Workshop on Bacterial Genomics Appendix VI cont 3 3 0 Central intermediary metabolism 3 3 01 2 Deoxyribonucleotide metabolism 3 3 02 Amino sugars 3 3 03 Entner Douderoff 3 3 04 Gluconeogenesis 3 3 05 Glyoxylate bypass 3 3 06 Incorporation metal ions 3 3 07 Misc glucose metabolism 3 3 08 Misc glycerol metabolism 3 3 09 Non oxidative branch pentose pathway 3 3 10 Nucleotide hydrolysis 3 3 00 other 3 4 0 Degradation of small molecules 3 4 1 Amines 3 4 2 Amino acids 3 4 3 Carbon compounds 3 5 0 Energy metabolism carbon 3 5 1 Aerobic respiration 3 5 2 Anaerobic respiration 3 5 3 Electron transport 3 5 4 Fermentation 3 6 0 Fatty acid biosynthesis Appendices 3 3 11 Nucleotide interconversions 3 3 12 Oligosaccharides 3 3 13 Phosphorus compounds 3 3 14 Polyamine biosynthesis 3 3 15 Pool multipurpose conversions of intermed metabol m 3 3 16 S adenosyl methionine 3 3 17 Salvage of nucleosides and nucleotides 3 3 18 Sugar nucleotide biosynthesis conversions 3 3 19 Sulfur metabolism 3 3 20 Amino acids 3 4 4 Fatty acids 3 4 5 Other 3 4 0 ATP prot
241. s and options see the appropriate README file 165
242. s exercises However a faster way would be to make use of orthologue cross links provided by GeneDB 104 Workshop on Bacterial Genomics 1 DB CDS Tb927 2 2900 BB WARNING Sth September 2005 The Full Content Search w ctl poring at the premous data rus data Thus wil be fined later n the day GoTo Bele Contact cunitor bpn General Diaj CEA Briket sequence ind Sysbematir Mame 2 29800 Gene Svenn 166 220 pole edereed homology Produits ARA compiler 5 chert CDs Seguence DEA and Lec ation lirsmosome 2 laemoreme Locatan Legi 43 bp 1 Tevet 2480 T8327 2 0948 Presary Axastatien ABP23 complez 4 curabed by E Wackertead Ver of Predicted Peptide Properties Mais ST kih Amine acids 1 Ireelertric poma pais Charge 30 Signal Peptide Hot found Domani O found GPI Amher Proivin 2 iiin Information Aree Deceriprim FEOSRSS AKP2IS complex 20 mbart 1 177 Score 18 01 complet subunit Dered Grom he 155 TIGR Thal Th927 2 2900 TIR REF GO rel wih SWISS PROT O15500 oder GOC mierpr zq ISS TIGE Thal TOET 2 2900 TTGR_ REE GO rel wt
243. s it tends to be comprehensive but give many hits 133 Workshop on Bacterial Genomics 8 We re now going to move onto complex querying which allows searching of several genomes concurrently if desired and allows a diverse range of queries to be used Exercise 2 Searching of multiple genomes using more complex keywords manipulation of and downloading results sets DB Plasmodium falciparum GeneDE annotate of the Plamen dilojpane UOT peor is available irom thas demo version of the database There parte page for cach pene comainang the folkrwing information and probe sequences predicted physical properties doman moma beranasre links osasrularuy mformahion GO annotation onholopucs Database Entry Point emniliL AST EMOWSE Add wildcards Amit TET Lisi FIERE ews Help Feedback Corsi Technical Project information from The Cronmome Sequencing The Wellcome Trust Sanger The for Gence Resarch Stanford University Experimental reece The Malaria Kescarch E 10 DB Go To Search Simple maps available link above This will bring you through to the page below Since you are going to apply the se to all three Plasmodium genomes in GeneDB you have to select a complex sea
244. se APS scher 19 1 3 domas PROT 1 use SAM Eid PEOTSAT 1 te 2 benare 2 dalydrogenase tarsal dose PRD A BAE LLL HENDE ATA ihren Arginyl tRNA synthetase N terminal domain PF03485 1 A Armadillo beta catenin like repeat PF00514 5 A ARP2 3 complex 16 kDa subunit p16 Arc PF04699 1 A ARP2 3 complex 20 kDa subunit ARPC4 PF05856 1 A Arp2 3 complex 34 kD subunit 34 PF04045 1 A Aspartate ornithine carbamoyltransferase Asp Orn binding domain PF00185 1 A Aspartate ornithine carbamoyltransferase carbamoyl P binding domain PF02729 1 A Key A Ware Automate predichon 10 takes yor to the entry that wah the selected letter and shows you the next 100 alphabetical Next 1 IJABCDEFGHIJRLMNOPORSTUVWKYE P21 ARC ARP2 3 complex 21 kDa ssburst PFO4062 1 amp PF02225 1 2 tuper 01569 4 amp Papam protease 00112 12 amp Paradagelar rod proten PF05149 11 amp Workshop on Bacterial Genomics Exercise 1 4 Using GO annotations Gene Ontologies are structured vocabularies that are designed to describe biological processes in an accurate and consistent way for more information see http www geneontology org It is composed of three separate ontologies describing
245. search Limited zu 3 4 amp 5 Sequence file 1 5 typhi dna Click and select omparison file Chogbe 2 Comp file 1 typhi dna vs 12 gt Sequence file 2 EcK12 dna Chog ERN re files Baro ly Close For comparing more than two genomes Bo Choose first sequence Enter path or folder name er es ae fModule 7 comparative genomics Click Apply and wait eee ECE1Z2 dna EcKle embl EcE12 tah Pfal chrl3 embl Pknowlesi contig embl Pknowlesi contig seq Plasmodium comp crunch 5 typhi cod 5 typhi dna Folders Comparison files end with crunch For more info on comparison files see Appendix II Enter file name S typhi 30 Workshop on Bacterial Genomics Module 2 Comparative Genomics 2 The basics of ACT You should now have a window like this so let s see what s there 1 St dna vs 12 4 800 1600 2400 3200 4000 4800 5600 6400 7200 Vereen alg JUT AE n TEN View Selected Matches Flip Subject Sequence 3 Flip Query Sequence Lock Sequences Unlock Sequences Set Score Cutoffs Set Percent ID Cutoffs 4 Offer To RevComp LOCKED WE PAM PENSA ludi n avn ain fy a 4 800 1600 2400 3200 4000 4800 5600 6400 7200 i D TTA LLL ii d
246. sequence alignment searches 1 DNA DNA comparisons If you are comparing large sequences such as whole genomes of several Mb the blastall program is not suitable The Blast algorithms will struggle with large DNA sequences and therefore the processing time to generate a comparison file will increase dramatically Megablast uses a different algorithm to Blast which 1s not as stringent which therefore makes the program faster This means that it 1s possible to generate comparison files for genome sequences in a matter of seconds rather than minutes and hours There are some drawbacks to using this program Firstly only DNA DNA alignments BlastN can be performed using megablast rather than translated DNA DNA alignments TBlastX as be using blastall Secondly as the algorithm used 15 not as stringent megablast is suited to comparing sequences with high levels of similarity such as genomes from the same or very closely related species In this exercise you are going to download two Staphylococcus aureus genome sequences from the EBI genomes web page and use Artemis to write out the FASTA format DNA sequences for both as before in exercise 1 These two FASTA format sequences will then be compared using megablast to identify regions of DNA DNA similarity and write out an ACT readable comparison file The genomes that have been chosen for this comparison are from a hospital acquired methicillin resistant S aureus MRSA strain N315 BA0000
247. speedy exercise for a small region of DNA on a whole genome view we will move onto this later this many take a little time so be patient To view the graphs Click on the Graph menu to see all those available Perhaps some of the most useful plots are the Content 1 GC Deviation 2 and Karlin signature plots 3 as shown below To adjust the smoothing of the graph you change the window size over which the points on the graph are calculated using the sliders shown below If you are not familiar with any of these please ask File Earias dur Gein YY TNNT Sliders for c smoothing En ET CT wa wa niii Fiat Ca E TRECE ee T il mmn WT IU I i Lm 1 CLE tu d a 1 L b Pt FRI mi feabure misc zi pagio prasaoo 188200 2191000 prsg 2195200 2195600 192000 99400 2008 zozon 2201500 r3 q 1 1 H B q EEN Dy ON I m wn 1 Id ug EN Hu du 3 1 wem n pam iem ID rm ll Y f muc TII TI LLL 1
248. splays results of queries executed using the boolean query pages The page will remain empty uatil searches have been run Please note that a search combining boolean operators will etum multiple results files Choosen queries are first executed across all organisms within GeneDB and subsequently narmowed down to only retum data from the organisms of choice Once query results have been retrieved users combine results files one of three ways adding files together union identifying common results between files Gatersect or identifying unique results between files ubtraction Please note you must have cookies enabled in your browser ia onder for this page to work correctly The result files can either be viewed or downloaded as a FASTA file of DNA or protein sequences Start Response Query ti ti Result Download Size Proteins with a product containing the keyword or phrase transporter intersect Proteins predicted to have GO component membrane lt l second view Download 438 r Genes for pberghei pchabaudi malaria intersect Proteins with a product containing the keyword or phrase transporter intersect Proteins 5 06 08 dumi ee ended di predicted to have GO component membrane PM m Proteins predicted to have between 8 and 14 transmembrane domains intersect Proteins predicted to have GO process transport EU lt l second view Download 428 Genes for pberghei pchabaudi malaria intersect
249. subsrursion may be retneved any mmber of hmes dunng this After thos quenes must be resubnutted if further required Low complexity disabled Repeatmacker disabled 1 WwW N ON ee 1 Full Sequence Full Sequence Full Sequence Full Sequence Full Sequence Blast Server Results smee 2 utted if further examination is required 2 0 P WaehU 16 Xep 2002 decunix4 0 ev6 112LPF64 2002 09 18119 28 12 Copyright C 1996 2002 Washington University Saint Louis Missouri USA ALL Rights Reserved Reference Gish 1996 2002 http blast wastl edu Query 1081306 916 letters 4185 zequencsz 1 271 478 total lettera Searching Sequences producing High scoring 5egment Pairs Me 20 ZI MI Mises gu 290 1009 done Xmallrert High Probability H 5care PIHH prn HPP1150 pertaetin precursor 1279 057 12131837 forward 4647 BPP2531 putative autotransporter 27391262 2734223 UPP297S putative autotransporter J197029 J71991047 re ver amp e EBPP1518 aukobransporker 17275612 1727171 Forward 3634 autotransporter 1723708 1727183 forward HW 32613 HPPIHI 5 autaotransporter 19394775 1940547 forward HW 7151432 715304H BPP241
250. t in the qut cluster of ALL four fungi Note down the gene order and direction of transcription in each after you have completed annotation of the P anserina gnes in the qut cluster Use the right click on your mouse and select score cutoff window to appear Scroll along the bar to screen out low 1600 2400 4800 5 6 i scoring hits 6400 1600 pono E 4800 5600 6 N crassa qut embl vs A fum qut embl vs nid qut embl vs P anserina qut embl vs crassa qut embl B File Entries Select View Goto Edit Create Write Run Graph Display 1600 2400 3200 4800 5600 6400 E 8000 ee LOCKED Clase Reset An 3200 m 4800 5600 6400 1200 8000 880027 800 1600 2400 46 Workshop on Bacterial Genomics Module 2 Comparative Genomics N crassa qut embl vs fum qut embl vs A nid qut embl vs P anserina qut embl vs N crassa qut embl File Entries Select View Goto Edit Create Write Run Graph Display 2200 im 6600 11000 13200 15400 E AP A lt LOCKED ii bonn 4400 FEQ 11000 ET Dm oo TW MTT 7 2200 lt m mmn 8800 13200 I 7 2200 4400 4015 13200 15400 121000 LOCKED 15400 2200 zx 6600 11000 13200
251. the Sanger Insunutc gt e Wellcome Trust lt Sanger Institute Fatigue er pianta mal Help 5 phi Workshop on Bacterial Genomics 13 Complex queries can be built up using this page It uses a Boolean approach Many different data types can be search and AND or OR can be used to enhance searching Our objective 1s to find hexose transporters in the three Plasmodium genomes Try the following complex search strategies and compare the results you obtain The following pages show you how to set up the first query Then try numbers 2 and 3 for yourself 1 Proteins with a product containing a particular keyword or phrase transporter AND GO component membrane 2 Proteins containing one or more transmembrane domains try between 8 and 14 AND Proteins with a particular GO process transport 3 Proteins with a product containing a particular keyword or phrase transporter AND Proteins with a signal peptide Click on the AND button and a second pull down menu should appear below the first Select Proteins with a product containing a particular keyword or phrase by clicking on the first pull down menu and selecting this option D See Ld 1 i Fabara iram d 1 e rue XY Seach Comples This page s haild mar compe queries ag msi 1 phear i prai elects of query Borm amd fux iania ASI R TTE ici a ama ipnr fu hs chuck a
252. these organisms Browse majar berghri TP choral annaka T bruce T cruci viar Browsing Riley Catalogue Search for namera Browse through the Riley assigromernis for theer organisms Bacteria rs j s Sa ii DB Pfam List Ge To Organs Go To Reals 100 of 16 results shown amp Warung Awomanc predican This takes you to the fina entry thet utaris with the selected ketier and shows you the mai 100 alphabetical enirics O9ABCDEEGHIJKLMNOPORSIUXWXYZ CC HB romain Organist bor demain PEORES 2 1 Preven 100 DNA 22003 L3 C1 L dcoxy D xyhulose S phosphate redecterseerase PEO2670 1 14 3 3 protcin 2 dehysdrogenacss acy Hransierase catalytic 01981348 2Eg 25 cher binding domam PEOD 11 9 a Y exonbonmicicasc doma PEUT L138 3 X PROD 12 60 8 143 Workshop on Bacterial Genomics 41 DB Go To Organisms All of 11 results shown falciparum 785 lt falciparum CDS PEBO275w P chabaudi CDS 7000604 03 P chabaudi CDS PCO00855 00 0 P chabaudi 000736 00 0 P berghei CDS 01031 03 0 P falcipar
253. tic Annotation 2 PC000560 00 0 P chabaudi phosphate transporter putative Automatic Annotation 3 PC000437 02 0 P chabaudi transporter protein putative Automatic Annotation 4 PBO000168 03 0 P berghei phosphate transporter putative Automatic Annotation 5 00604 03 0 P chabaudi sugar transporter putative Automatic Annotation 6 0141 P falciparum UDP galactose transporter putative Manual Annotation j 450 lt P falciparum putative transporter protein Manual Annotation N PEO7 0065 P falciparum zinc transporter putative Manual Annotation 9 PEO7 0070 P falciparum transporter permease protein putative Manual Annotation 10 MAL L3P1 206 P falciparum phosphate transporter putative Manual Annotation 11 14 0133 falciparum ATP dependent transporter putative Manual Annotation 12 125 P falciparum ABC transporter putative Manual Annotation 13 PECOS7Sw P falciparum transporter putative Manual Annotation To view this result set later or to download this data visit the history page 136 Workshop on Bacterial Genomics The History page allows results sets to be viewed and downloaded It 15 only active for Boolean searches but is a very useful way of tracking and manipulating results sets It also allows results sets to be added together 22 union the contents of one set removed from another subtract and identification of those that appear in both sets intersect lt Sanger Institute D B Query
254. tion and context map Peptide Amine ott Clicking on the Graphical display in Artemis peint Charge no open up an Artemis applet which will be rasan ana 12 probabi helices predicted ior PEU 10e D at 2047 RO 10R 1 73 90 157 ee 290 315 Dem W3415 42k430 ad 484 476 discussed further The applet allows the feature to be viewed in the context of the sequence and additional annotation such as UTRs LIT em Purad ee eek ee pomubaddeum LT IC s D ppl Raider PAF Ree tA ile Toe ete Bap Derned inan PRINTS Bejarisd Jaurups anims 1487 jiminas spel e qa frm har Plam mex 1 THT H on arc a gor TF RR a en LT NNI PRINTS 910117 tain Am 4 995 4614 OUO1008 and 415 4178 df ER Mh d 8 EIE EIE TE ETE dH d ERO EE NE SU BB Ho dolo Uem 1 Loom an CoE Rae BLA Hinia Cariad races Baa jni CES TT mna ijma hif 12 5 gt m Cellar C
255. ty Keywords Complete proteome 4969 others Cytoskeleton 25 others This SWISS PROT entry is copyright It is produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL outstation the European Bioinformatics Institute There are no restrictions on its use by non profit institutions as long as its content is in no way modified and this statement is not removed Usage by and for commercial entities requires a license agreement http www isb sib ch announce or send an email to license isb sib ch 90 Workshop on Bacterial Genomics The Wellcome Trust lt Sanger Institute Pathogen Sequencing Unit DB Gene Results List 2 Go To Organisms M Go To Shortcuts Results 1 to 6 of 6 results shown Previous Next Report Download 5 pombe CDS SPAC6F6 10c 2 2 ARP2 3 actin organizing complex subunit Arc34 arc34 5 pombe CDS 5 6 9 07 4 ARP2 3 actin organizing complex subunit Arc4 obsolete arp 10 S pombe CDS SPACIIHI1 06 arp2 2 3 actin organizing complex subunit SPAC22F8 01 5 pombe CDS 4 8 06 arcl 2 3 actin organizing complex subunit Sop2 sop2 5 pombe C5 3 SPAC17G8 04c 5 ARP2 3 actin organizing complex subunit Arc16 arc16 5 pombe CDS SPBCI778 08c 3 arc3 ARP2 3 actin organizing complex subunit Arc21 arc21 Previous Next Hosted by the Sanger Institute Send comments requests
256. uencing Unit Hi Go To e _Search Simple Help This page displays results of queries executed using the boolean query pages The page will remain empty until searches have been run Please note that a search combining boolean operators will return multiple results files Choosen queries are first executed across all organisms within GeneDB and subsequently narrowed down to only return data from the organisms of choice Once query results have been retrieved users can combine results files in one of three ways adding files together union identifying common results between files Gntersect or identifying unique results between files subtraction Please note you must have cookies enabled in your browser in order for this page to work correctly The result files can either be viewed or downloaded as a FASTA file of DNA or protein sequences Query Start Response Result Download Size time time gt NEC 3 14 40 Genes for intersect Proteins that contain Pfam domain transporter PM lt 1 second view Download 22 intersectEroieiturpre dittedibodiaen and S irwnsmenibrane domains E va S meneni s 7 Genes for zyp intersect Proteins that contain Pfam domain transporter intersect Genes for 1 00 00 3 ri ryp intersect Proteins predicted to have between 7 and 8 transmembrane domains view Dowulad 2 Follow one of the lin
257. um PEBO210 P chabaudi CDS PC000438 02 0 P berghei PB000562 01 0 P falciparum CDS PEIO9SSw P berghei PB000243 00 0 Hosted by the Sanger Institute Search results Go To Shortcuts transporter putative hypothetical protein conserved sugar transporter putative transporter putative monosaccharide transporter putative conserved hypothetical protein monosaccharide transporter putative conserved hypothetical protein monosaccharide transporter putative sugar transporter putative transporter putative 144 The Wellcome Trust lt Sanger Institute J Pathogen Sequencing Unit Help Report ow nload Workshop on Bacterial Genomics Exercise 4 Searching using Gene Ontology annotation Another search strategy 15 to search on the basis of Gene Ontology terms Gene Ontologies are structured vocabularies that are designed to describe biological processes in an accurate and consistent way for more information see http www geneontology org The ontology is composed of three terms the molecular function biological process and cellular component location of a protein Where evidence exists from the literature from sequence analysis or other sources Gene Ontology terms for function process and component are attributed to that gene AmiGO 1 a database of Gene Ontology associations that is designed and maintained by the Gene Ontology consortium It allows searching and browsing of gene ontology annotation
258. umber 0548 0 Entered TTEMBL Release 31 13 SEP 2005 Sequence was last modified Release 31 13 SEP 2005 Annotations were last modified Release 31 13 SEP 2005 Protein description Protein name GroEL Origin of the protein From Escherichia coli TaxID 562 Taxonomy Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enterobacteriaceae Escherichia References 1 NUCLEOTIDE SEQUENCE STRAIN JM105 Dugourd D F Young Wright J A Molecular chaperones chaperonin encoding stress genes groEL and groES and their use as antimicrobial targets Submitted DEC 2000 to the EMBL GenBank DDB databases Comments FUNCTION Prevents misfolding and promotes the refolding and proper assembly of unfolded polypeptides generated under stress conditions By similarity SUBUNIT Oligomer of 14 subunits composed of two stacked rings of 7 subunits By similarity doc a a z SS THT 84 Workshop on Bacterial Genomics Module 6 Data mining using GeneDB Introduction This module will demonstrate GeneDB http www genedb org a genome database housing sequence and annotation of prokaryotic and eukaryotic organisms The resource provides a portal through which data generated by the Pathogen Sequencing Unit and other collaborating sequencing centres can be made publicly available It combines data from finished and ongoing genome and expressed sequence tag EST projects with cur
259. up window as shown above This will now allow you to edit the start and end positions of the feature boxes by using the left mouse button Click and hold down the curser over the first or last base of any feature and then drag the mouse The feature box should move as you drag it see below This can be a little tricky so please ask 11 _ File Entries Select View Goto Edit Create Write Run Graph Display Selected feature bases 784 amino acids 261 PF13 0119 colour 2 gene PF13 0119 product Entry Malaria embl 33 338 3 33 HE EE EAE MION I 13 0120 pel TFT TE T TT ETT T A DEUM UN Click to select GC Content Window size 120 exon to edit 2 00 1600 2400 3200 4000 4800 5600 6400 7200 8006 Hr I I LUE PE EE NUI I MO OTI EE Dow d d o I 4 amp 4 T Y qd B Lo RB Y P X K N E T o ok Y Y 1 X AGAATTAAATAATATCTAAATATTATATATATATATATATATATATAA R Agy 4760 4770 4780 4790 4800 Tar 4820 4830 4840 485
260. use the sliders indicated below However be careful zooming out quickly with all the features being displayed as this may temporarily lock up the computer To make this process faster and clearer switch off stop codons by clicking with the right mouse button in the main view panel menu will appear with an option to de select stop codons see below If you have any problems ask a demonstrator To de select the annotation click here Entry Content 1 File Entries Select typhi dna Window tvphi tab 354 2120 Coto Edit reste Write An Crah Dila MfN fa x MW Y hp Pd A j V v 1 f N M A h n V 25 4 Feature Viewer Menu Raise Selected Features Lower Selected Features S E DY Smallest Features In Front MA M V Zoom to Selection M ME NN Select Visible Ranae Select Visible Features mere 718 Set Score Cutoffs Entries Select Goto View Edit Create Write Run 345 87 2348 gTY2371 bb mi R TY2 x BS 2200000 2202200 3 3 y 3 x 22 5 gt 4 Feature Labels 4 Line Per Entry Forward Frame Lines Reverse Frame Lines 4 Start Codons Stop Codons L D 21 1 TO ESISDOESKOSRKRISUECRZQSQ n i 1 5 G K
261. utotransporte Searches Analysis M Include description omniBLAST M Add wildcards BLAST Motif Search EMOWSE List Download Cross Organism Search Page Full Content Search Complex Boolean Query Julian Parkhill Project manager Mohammed Sebatha Curation Contact the developers Miscellaneous Information Example genes BPP2409 rpsR rpll 121 Workshop on Bacterial Genomics The Wellcome Trust Gene Results List Sanger Institute gt Pathogen Sequencing Unit Go To Organisms GoTo Shortcuts Help Results 1 to 17 of 17 results shown Previous Next Report Download parapertussis CDS BPP2415 autotransporter vag8 parapertussis CDS 0735 autotransporter B parapertussis CDS BPP0417 autotransporter subtilisin like protease sphB 1 parapertussis CDS BPP2975 putative autotransporter parapertussis CDS BPP2745 autotransporter pseudogene sphB2 parapertussis CDS BPP0452 autotransporter parapertussis CDS BPP2591 putative autotransporter bapC parapertussis CDS BPP 1815 autotransporter bapB parapertussis CDS BPP2251 putative autotransporter Pseudogene bapA parapertussis CDS BPP0822 autotransporter B parapertussis CDS BPP1256 autotransporter parapertussis CDS BPP1618 autotransporter B parapertussis CDS BPP1617 autotransporter parapertussis CDS 1998 autotransporter phg parapertussis CDS BPP0449 autotransporter B parapertussis CDS BPP2022
262. utton Then use the pulldown menus to select two different queries Finally click on proceed to next step to generate a page that will allow you to specify parameters for the two queries Running the resulting boolean query will retum only those objects that satisfy query 1 AND query 2 Please note you must select a query from each pulldown menu before the proceed button will work If you need to back up a step use the browser s back button You are currently searching T brucei Choose a different organism Choose a boolean condition or select a query OR None Proteins with product containing particular keyword or phrase Proteins with annotation matching particular keyphrase wa Proteins with a predicted GO function Hosted b Proteins with a predicted GO process Proteins with a predicted GO component Proteins within a range of length in amino acids Proteins within a particular range of molecular masses Proteins containing a specific Pfam domain Proteins with a Pfam domain Proteins containing a predicted signal Protein containing one or more predicted transmembrane domains 4 Send us your comments on GeneDB The Wellcome Trust DE lt Sanger Institute gt Pathogen Sequencing Unit GeneDB x Search Simple Help This page allows you to build more complex queries against the database using a preset selection of query forms and
263. y Databases WHATS 2can more ArrayExpress This logo is a link ta MIAME relevant section in Proteomic Services the EBI s new UniProt DAS Nucleotide Databases bioinformatics ASD educational website Analysis ATD NEW ecan Bioinformatics Ig e EMBL Bank EMBL CDS ClustalW Updated Live EBI News Feed Ensembl GeneWise PromoterWise Census more S CURA Structural Analysis more zm Protein Databases DaliLite NEW CSA Maxsprout GOA A Submissions MSD Services ntAct 5 more Tools Miscellaneous EMBL Computational Services Expression Profiler NEWT QuickGO ntEnz nterPro PANDIT UniProtKB Swiss Prot UniProtKB TrEMBL UniProt more Proteomic Databases i k ams Exercise 1 79 Workshop on Bacterial Genomics InterProScan Mozilla File Edit View Go Bookmarks Tools Window Help Q Q http www ebi ac uk InterProScan Module 5 Genome Resources 4 Home EjBookmarks Google S Library S SSC S SSC dev S S PSORT S Pfam S SRS7 S FTP E coli DB GeneDB 5 Entrez PubMed A Nucleotide sequences co Site search Go MS i Datab uropean Bioinformatics Institute ed EB InterPro home Text Search InterProScan Databases Documentation
264. yces pombe for which a complete genome sequence has been obtained and the kinetoplastid protozoa Leishmania major and Trypanosoma brucei whose genome sequences have yet to be completed It is envisaged that the generic database structure will subsequently be adopted to integrate datasets for other organisms both prokaryotic and eukaryotic that have been sequenced by the Sanger Institute Pathogen Sequencing Unit Note as this site is under continual development please be patient if things do not appear to be working let us know and it will be fixed Please also note that the URLs web page addresses may change so be cautious about creating links to them If you contact the developers we will try and keep you informed of any changes If you have any suggestions about the site please contact the developers 44 5 Workshop on Bacterial Genomics 3 DB Schizosaccharomyces pombe GeneDB DEB ING 20th July 2005 There is currently a bug with the downloading of intergenic sequences from the List Report Download if partial CDS are included his feature will work correctly ifthe partial CDS sequences SPAC977 01 SPAC750 08C or SPAPB21E7 10 are not included in your download list his database contains all 5 known and predicted protein coding genes pseudogenes transposons tRNAs rRNAs snRNAs snoRNAs aNd other known and predicted non coding RNAs Curation of new and existing literature is ongoing and changes are incorporated weekly
265. zing complexiSchizosaccharomygces pombeichr 1 Nanual MASFNVPIIM DNGTGYSKLG TYAGNDAPSTV FPTVIATRSA GASSGPAVSS GHELSSKRATE DLDFFIGNDA LKXASAGYSL DYPIRNGOIE NWDHMERFUWQO QSLFKYLRCE PEDHYFLLTE PPLNPPENRE NTAEIBFESF NCAGLYIAVO AVLALAASVT SJEVTPRSLT GTVVDSGDGV THIIPVAEGT VIGJSIKTMP LAGRDVTYFV QSLLEDRNEP KEECCYVCPD IVKEFSEFDR EPDRYLKTAS ESITOHSTTI DVGFERFLAP SDFLTPLPEL VONVVOSSPI DVRKGLYNKNI VLSOGSTLFK NFGNELORDL SEMLSGAKSG CVDVNVISHE RQPNAVWFOG SLLAQTPEFG SYCHTKADYE 100 QIFGNSL KPSTNASKGS PIJSLKTAERI EIFFNPEIAS KRIVDERIMR EYGASIARRY Workshop on Bacterial Genomics D The Tii 4 DEB GeneDE omniBLAST Server Sanger omg b 1 wm a BLAST search on a set of protem databares BLAZTE of BLASTA dependag on the query sequence or databases and TRELASTAE or TELASTH P and als sf the ben for each database any you can Full Search to see the complete ELAST output abares wih dierent parameters G8 organi BLAST QUERY DATA SFRAC4SD D arp hp eats 5 X n irzt wil de HE Cp Len Per iiirfianua 3ERATEDLDFF AL RHUGTENMDHB Deternzne sequence C aF pee DNA C pee F Mews SLUT ar her mag areara

Download Pdf Manuals

image

Related Search

Related Contents

En savoir + sur le soudage à l`arc - Saf-Fro    MANUAL DE USUARIO CONTROL DE TRÁFICO  Manual de usuario  Sika®Rojo, - Distribuciones Villamar  LAI 400/550  つながる自動車保険 かんたんガイド  Philips FC9714  

Copyright © All rights reserved.
Failed to retrieve file