Home

User Guide - ENDscript 2 - ESPript

1. Risler matrix 17 PAM250 matrix 18 A CODE F G H ROE MNP OTR S EEV Wey A R N D C OE C H TAME ALIAS Wie AN A 2215 2 16 6 G Iv IA ESO ES es 15 2 019 A OE 223 0 A 2 2 0 0 2 0 0 1 1 1 2 1 1 4 1 1 ESO SOS C15 2221115 16 11 8 16 16 1 5 1e o 1 84 1513 14 14 138 1 1 3 0 REZO OA 38 LL 82S OSLO On 2 4 20 ES DER 1 ESO DO SNA O OOS ha A380 NP OS 02 22 2 4 O 2 2 3 1 2 4 1 O 4 2 2 15 By SSO 2o Go 3 G S AS e A SS Tre lo 6 O T230 DADA 2 4 5 2 3 4 NE AS F 6 16 3 6 22 4 11 10 10 2 4 11 AS ES 28 SL 0 3 0 CH E SAS A 6 M 8 0 2 8 20 KH 215 G 6 17 4 3 4 22 12 0 1 2 4 212 2 1 7 2 1 13 2 30 On 0n 2 5 4 2 1 3 2 2 1 1 5 0 1 1 5 4 2 15 H e l3 E 1m2 2 gt 9 10 2 8 A 8 3 0 E 0 1 O ae O LS OS O OA A 5 ye A E A E 9 9 S67 14 TA AG T6 22 4230 Ge o Ta al 0 5 8 4 D 3 5 OH F gt SiS K 14 16 114 1 1 10 10 22 7 410 7 17 21 14 12 12 11 5 30 H 1 2 2 1 3 3 1 2 6 2 2 0 2 2 0 1 1 3 0 2 15 L 13 15 2 9 10 2 9 2 T 22 18 8 8 11 12 13 12 20 8 5 30 I 1 2 2 2 2 2 2 3 2 5 2 2 2 1 2 O 5 1 ASES M 10 16 5 6 2 4 12 9 4 18 22 0 12 12 11 6 8 8 13 2 30 L 2 3 3 4 6 2 3 4 2 2 6 3 4 2
2. 3 3 2 2 2 15 NE Sd A E SO eo O22 OA TATE EES EE AOS NEAR A a Aa P 2 18 12 1 11 12 16 6 7 8 12 10 22 6 3 3 5 6 16 12 30 M 1 0 2 3 5 1 2 3 2 2 4 0 6 Q0 2 2 1 4 2 2 15 18714 6 21 7 2 5 M 17 i 2 To 6 22 20 18 17 1T5 10 5 30 E A A A aly a 0 5 O ls R 15 15 1 19 4 1 4 14 21 12 11 12 3 20 22 20 19 15 8 8 30 P A 3 Ole A A NS OS 5 ales SEO ASS 5s A Ge AS ISS Ae AO ELLAS 8 4 SO S Oe AO AA E ss JA A S 2 9 A E 1D 8 a Sey TO E OS 30 Tele le 0 Oe o 1 A A O13 0 3 5 3 205 y 20 14 0 16 8 1 7 22 12 20 8 I 6 15 15 18 16 22 3 30 A a a NA 6 oy 0 ES W 9 18 14 10 9 13 17 7 11 8 13 11 16 10 8 8 10 7 22 6 30 Y 3 4 2 4 0 4 4 5 0 1 1 4 2 7 5 3 3 010 2 15 Y 2 11 4 2 20 2 8 4 5 5 2 1 12 5 8 4 3 3 6 22 30 V 0 2 2 2 2 2 2 1 2 4 2 2 2 1 1 1 O 6 2 4 15 30 30 30 30 30 30 3 0 30 30 30 30 30 30 30 30 30 30 30 30 30 0 215 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 0 BLOSUM62 matrix 1 Identity matrix A R N D C 0O E G H DERK OMA E P SN AN A TR N D CORE C HTP LER MEPER S T WY y z A 4 1 2 2 0 1 1 0 2 1 1 1 1 2 3 2 0 4 DIO AO OO O AO OOO AOS OOOO ORO OO BELO OA RS O SO a Si 82S i S82 3 4 R ON ae SOLOS Oe OO AO ODO OO OOO OOO OO N2 06G T a 0 o A OE E A oSA
3. E Burkhardt K Feng Z Gilliland G L lype L Jain S Fagan P Marvin J Padilla D Ravichandran V Schneider B Thanki N Weissig H Westbrook J D and Zardecki C 2002 Acta Cryst D58 899 907 16 Kyte J and Doolittle R F 1982 J Mol Biol 157 1 105 132 17 Risler J L Delorme M O Delacroix H and Henaut A 1988 J Mol Biol 204 4 1019 1029 18 Dayhoff M 1978 Atlas of protein sequences and structure National Biomedical Research Foundation Washington D C 19 Henikoff J G and Henikoff S 1996 Methods in enzymology 266 88 105 User guide last revision March 24 2014 ase oS 2005 2014 The ENDscript authors CNRS Contact espript ibcp fr D SBGrid Consortium ENDscript is an SBGrid supported application
4. is calculated o Only coordinates of protein residues are taken into account o Cystein residues involved in disulphide bridges are identified e CNS Main role calculates inter and intramolecular contacts o CNS calculates both crystallographic and non crystallographic contacts between each protein molecule o Contacts between protein residues and hetero compounds are also calculated if these latter have been automatically or manually kept o If available cell parameters and space group are extracted for crystallographic structures o Hydrogen atoms are deleted and thus excluded from distance calculation o Main chain atoms N Ca C O can also be excluded from distance calculation by enabling the Use side chains only option in the first box of the form Query PDB file o Upper limit for calculation of inter and intramolecular contacts is 3 7 A by default and can be changed with the Contacts up to option The shortest intermolecular distance is taken for each residue e ESPript Main role generates the first ENDscript flat figure o The protein sequence of each chainID contained in PDB query is displayed o Secondary structure elements have been calculated by DSSP in the previous step and a 310 and Jt helices are shown above sequence as medium small and large squiggles with a 8 and n labels respectively B strands are shown as arrows labeled 8 Strict a and B turns are marked by TTT and TT letters respec
5. nE A NACO OOOO OO OO OOOO OOA OOO DIAS S 2 ASS OA SMS A SS AS SO 4 SSL DOTO O 0 TO O OOO AO AO OOO Sy AO MOROS O CAI SN SA CO O O OOO AO OO 205 OOO OO OO O 30 gt x0 DESLIZA O SI a O 7 OSADO OOOO OA AO OOO SO OOO 0 SOO E OPA EAS OS SA IO a A EROS 205 O00 OO OOOO AOS OOO DESLIZA O ZAS SO C30 0709 O 20 gt OO O AO OOO AO 0 OO Or 10 20 A2 0 0 a A O O a SS A 2 2 8 4 OOOO OOOO O OOOO OOOO OOO TESS SS IS SS A SA O A AS 3 OOOO OO OO O OOOO TA SS SIS SS 2 ZA 0 SL a e 4 Ee Oe 070 O AO OOO OO O OS SOF TAO AO OO OO 20 REZO ASS A LS SD A SO S382 a KOR OF O AO OOOO OO OO OO OA OA iO S00 M 1 1 2 3 e A 5 0 2 1 1 1 4 MOOS OL MOT ODIA OOO OO OOOO OOOO SI O oy oh AOS MN 3 ld Ee 00 09 O O OOO OOO O OOO OO OOO P 1 2 2 1 3 1 1 2 2 3 3 1 2 4 7 1 1 4 3 2 4 PROOMO O O O OOOO OOOO OOOO OO SE O OO A A A 4 o 2 2 4 SEO OL OO OO ORO SOO OO Oa OO OOOO T 0 1 0 1 1 1 1 2 2 1 1 1 1 2 S2 F O TO A A 0 0 0 A a OO 0 A OT 20 WBS SBS 45 40 230 2 8 S38 SANS 2 A 2 3 4 Ne On OR 02 FON OO sO OOO LOCOS OO OA OOO Y 2 2 2 3 2 1 2 3 2 1 1 2 A A 1 EA A O OA O O O O O O O O OL ONO NOS MS SS SS ES SS SOS aa OOO O OOO OOOO OOO OOOO O o A AAAA SS A A 4 SSA EA A SUS ASAS Sd i VARO OOO OO OOO OOO OOOO OA O References 1 Kabsch W and Sander C 1983 Biopolymers 22 12 2577 2637 2 Joosten R P te Beek T A Krieger E Hekkelman M L Hooft R W Schneider R Sander C and Vr
6. DSSP 1 2 to extract secondary structure elements disulfide bridges and solvent accessibility per residue o CNS 9 to calculate non crystallographic and crystallographic protein ligand and protein protein contacts o BLAST to search protein homologues using the sequence of the PDB query against a chosen sequence database Clustal Omega MAFFT 6 MSAProbs or MultAlin to perform multiple sequence alignments ESPript 7 12 to render all this information with flat figures ProFit 12 to superimpose all homologous proteins of known 3D structures on the PDB query PyMOL 13 to generate scripts and session files to display sequence and structure conservation with 3D interactive representations PhylodendronWeb to build a phylogenetic tree JalviewLite 14 for multiple sequence alignment editing visualisation and analysis O OOO Oe All these programs are launched sequentially in three succeeding phases e Phase 1 To run the first phase ENDscript uses as query either a four digit PDB 15 identifier or an user uploaded coordinate file in PDB format On the first box of the ENDscript interface Query PDB file fill up the form by at least o clicking on the PDE icon and typing the PDB entry code eg 2CAH of your protein structure NMR and crystallographic structures are supported o or uploading you own PDB file by clicking on the Browse _ button or equivalent depending on your browser language Cli
7. ENDscript 2 User Guide Preamble This user guide documents the ENDscript Web server developed by Patrice GOUET and Xavier ROBERT in the Biocrystallography and Structural Biology of Therapeutic Targets research team of the Structural and Molecular Basis of Infectious Systems laboratory UMR5086 CNRS Lyon University ENDscript is an SBGrid supported application This documentation contains all the information you need to use the ENDscript Web server as a beginner or advanced user The two following notation conventions are used to draw your attention to certain important pieces of information If the option Display all known structures is activated via the interface default an automatic search is performed to check if a sequence name can be related to a known 3D structure o The program identifies a helices shown by medium squiggles 349 helices small squiggles m helices large squiggles B strands arrows strict a turns TTT letters and B turns TT letters from the 3D structure Table of contents 1 Introduction 2 Overview of the ENDscript automated pipeline Phase 1 Phase 2 Phase 3 3 Phase 1 in details SPDB DSSP CNS ESPript 4 Phase 2 in details BLAST search Multiple sequence alignment ESPript 5 Phase 3 in details ProFit PYMOL ScriptMaker 6 Alignments output layout and file formats 7 Appendix 8 References 1 Introduction ENDscript is a friendly Web server which extracts and
8. ble surface can be mapped with the same coloring code via the PyMOL control panel The second PyMOL representation is named Sausage o It shows a variable tube representation of the Ca trace of the PDB query o For this drawing all homologous protein structures were superposed onto the PDB query with ProFit and the size of the tube is proportional to the mean r m s deviation per residue between Ca pairs o The same white to red color ramping is used to visualize sequence conservation o Hence the user can identify areas of weak and strong structural conservation and correlate this result with sequence conservation If applicable these two PyMOL representations can display an assortment of supplementary information compiled by ENDscript Biological unit in grey Ca trace representation o All NMR models in light pink Ca trace representation o Disulfide bridges in yellow stick representation o Side chains in line representation colored as a function of the conservation score o Nucleic acids in cartoon representation o Ligands in ball and stick representation o Contacting residues in pale green stick representation o Monatomic elements in dotted sphere representation o Identical residues in dark pink ball and stick representation and highlighted o PDB SITES markers in blue mesh representation Solvent accessible surface colored as a function of the conservation score o Sequence viewer These two representations can b
9. bove paragraph are written in red if the distance is less than 3 2 A and in black if the distance is in the range 3 2 5 0 A o Main information is given according to the written marks which shows intermolecular contacts A toZ 0to 9 or a to z means that the concerned amino acid residue has a non crystallographic contact with an amino acid residue of the Chain A to Z O to 9 or ato z eg this amino acid residue is involved in a non crystallographic interface A to Z O to 9 ato z in italic means that the concerned amino acid residue has a crystallographic contact with an amino acid residues of the Chain A to Z O to 9 or ato z e g this amino acid residue is involved in a crystallographic interface identifies a contact between two amino acid residues having the same names and numbers e g along a 2 fold symmetry axis x lt gt means that the concerned amino acid residue has a contact with a ligand i e an automatically kept or a chosen hetero compound see above paragraph lt gt in italic means that the concerned amino acid residue has a crystallographic contact with a ligand i e an automatically kept or a chosen hetero compound see above paragraph o Further information is given with colors A yellow background identifies a non crystallographic contact An orange background identifies an amino acid involved in both a crystallographic and a non crystallographic contact A blue frame identifie
10. ck on _ SUBMIT in the buttons frame The PDB query is processed with SPDB and the amino acid sequence is extracted A SPDB output file is generated and given to DSSP to extract secondary structure elements disulfide bridges and solvent accessibility per residue The same SPDB output file is then used by CNS to determine non crystallographic and crystallographic protein ligand and protein protein contacts At this point an ESPript figure is generated giving the following information on each monomeric sequence contained in your PDB query o Secondary structure elements and residues in alternate confirmation are shown above sequence query o Accessibility and hydropathy scales intermolecular contacts and possible disulfide bridges are shown below e Phase 2 A BLAST search using the sequence of the PDB query is performed against a chosen sequence database PDBAA by default to detect protein homologues The result is piped to a multiple sequence alignment software Clustal Omega MAFFT MSAProbs or MultAlin A second figure is then generated by ESPript o It shows the aligned sequences colored according to their degree of similarity o In addition each homologous sequence of known 3D structure is adorned with its secondary structure elements extracted by DSSP o Further information is presented below the alignment as in phase 1 e Phase 3 Two PyMOL session files are generated They can be downloaded and interactively examined with the
11. d hetero compounds if present Several common hetero compounds are automatically kept see table below and are subsequently depicted by given symbols on the flat figures The user can manually keep non recognized hetero compounds contained in its PDB query Hence he must type their names in the Keeping contacting hetero compounds tabular form up to 10 names of 2 3 characters per column and one name per line Hetero compound type Name Symbol given Nucleotides ADE CUA CYT THY URE A GC TU DA DE BE DT E Porphyrin groups HEM BCL BPH MQ7 Sugars GLC GAL MAN NAG FUC SIA XYL ss Miscellaneous NAD NAH NDP NAP FMN cs Modified amino acids Regardless of their names as long as they contain main chain atoms N Ca C O a Contacts between protein residues and automatically or manually kept hetero compounds are shown in the phase 1 flat figure In this goal the symbols lt gt are used according the assignment of the user in the Keeping contacting hetero compounds tabular form By default this mark is shown in red if the distance of the protein hetero compounds contacts is less than 3 2 and in black if itis inthe range 3 2 5 0 A e DSSP Main role calculates secondary structure elements o The program identifies a helices shown by medium squiggles 310 helices small squiggles m helices large squiggles B strands arrows strict a turns TTT letters and B turns TT letters from the 3D structure o Accessibility by residue
12. e downloaded and interactively examined with the molecular 3D visualization program PyMOL installed on the user s computer Expert users can also download a zip file archive containing PYMOL pm1 script and associated necessary files to manually edit them please refer to PyYMOL documentation or PyMOLWIiki 6 Alignments output layout and file formats The following options controls the layout of ENDscript flat figures generated during phases 1 and 2 You can render these figures in a variety of output formats and sizes These settings have no effect on the two PyMOL 3D interactive representations Font size font size in points monospaced Courier font for sequence names and residues default 6 Number of columns number of residue columns per line default 140 Color scheme o Normal standard color scheme default o Flashy flashy colors similar residues are written with black bold characters and boxed in yellow o Thermal colored with all letters in bold ideal for article figures o Slide light cyan background ideal for slides o B amp W a grey scale is used Orientation Portrait default or Landscape Papersize A4 A3 default AO US letter or Tapestry width 0 8m x height 3 3 m Rendering PNG or TIFF images may take some time especially if you use the 300 dpi or 600 dpi options Hence high dpi formats gt 150 dpi are only recommended for publication quality figures For examining the ENDscript flat figur
13. es PDF format is recommended PostScript and PDF files can be edited with Adobe Illustrator PDF files are viewable and printable from Adobe Reader 7 Appendix e Similarity scores If Risler BLOSUM62 PAM250 or Identity several scores are calculated in Group Score ISc is a classical computation of a similarity score within each group For acolumn made of 3 residues ACD ISc AC AD CD 3 Cross Group Score XSc is the similarity score average for every sequence pair where each sequence belongs to a different group For acolumn made of 6 residues divided in 3 groups ACD DE G XSc AD AE CD CE DD DE 6 AG CG DG 3 DG EG 2 3 Total Score TSc is the mean of in Group Score and Cross Group Score TSc ISc XSc 2 The user specifies a threshold for in Group ThIn and Diff Group ThDiff scores Colours are chosen according to the following rule x H Red box white character Strict identity Red character or black bold character with color scheme Flashy Similarity in a group ISc gt Thin Blue frame filled in yellow with color scheme Flashy Similarity across groups TSc gt Thin Green fluo box Differences between conserved groups ISc Xsc 2 gt ThDiff e Similarity scores matrices
14. fied by DSSP in the previous step are shown by green pairs of digits 1 1 2 2 below the bar of hydropathy o Intermolecular contacts calculated by CNS in the previous step are displayed along with disulphide bridges below the bar of hydropathy The shortest intermolecular distance is taken for each residue Corresponding contact symbols see above paragraph are written in red if the distance is less than 3 2 A and in black if the distance is in the range 3 2 5 0 A o Main information is given according to the written marks which shows intermolecular contacts A to Z O to 9 or a to z means that the concerned amino acid residue has a non crystallographic contact with an amino acid residue of the Chain A to Z O to 9 or ato z eg this amino acid residue is involved in a non crystallographic interface A to Z O to 9 a to z in italic means that the concerned amino acid residue has a crystallographic contact with an amino acid residues of the Chain A to Z O to 9 or ato z e g this amino acid residue is involved in a crystallographic interface identifies a contact between two amino acid residues having the same names and numbers e g along a 2 fold symmetry axis lt gt means that the concerned amino acid residue has a contact with a ligand i e an automatically kept or a chosen hetero compound see above paragraph lt gt in italic means that the concerned amino acid residue has a crystallographic c
15. gous sequence of known structure o Secondary structure elements have been calculated by DSSP in the previous step and a 310 and n helices are shown above sequence as medium small and large squiggles with a 8 and n labels respectively B strands are shown as arrows labeled 8 Strict a and B turns are marked by TTT and TT letters respectively o Residues in an alternate conformation are highlighted by a grey star above sequences o Relative accessibility calculated by DSSP in the previous step is shown by a blue colored bar below sequence White is buried A lt 0 1 cyan is intermediate 0 1 lt A lt 0 4 blue is accessible 0 4 lt A lt 1 and blue with red borders is highly exposed A gt 1 A red box means that relative accessibility is not calculated for the residue because it is truncated Remark only molecules located in the crystallographic asymmetric unit are taken into account by DSSP in its calculation of accessibility Thus you can find highly accessible residues involved in contacts with crystallographic neighbors according to the ESPript figure These residues are in fact buried in the crystal lattice o Hydropathy is calculated from the sequence according to the algorithm of Kyte amp Doolittle 16 with a windows of 3 It is shown by a second bar below accessibility pink is hydrophobic H gt 1 5 grey is intermediate 1 5 lt H lt 1 5 and cyan is hydrophilic H lt 1 5 o Disulphide bridges identi
16. iend G 2012 Nucleic Acids Res 39 Database issue D411 419 3 Brunger A T Adams P D Clore G M DeLano W L Gros P Grosse Kunstleve R W Jiang J S Kuszewski J Nilges M Pannu N S Read R J Rice L M Simonson T and Warren G L 1998 Acta Cryst D54 905 921 4 Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K and Madden T L 2009 BMC bioinformatics 10 421 5 Sievers F Wilm A Dineen D Gibson T J Karplus K Li W Lopez R McWilliam H Remmert M Soding J Thompson J D and Higgins D G 2011 Molecular systems biology 7 539 6 Katoh K and Standley D M 2013 Mol Biol Evol 30 772 780 7 Yongchao L and Bertil S 2014 Methods Mol Biol 1079 211 218 8 Corpet F 1988 Nucleic Acids Res 16 22 10881 10890 9 Gouet P Courcelle E Stuart D I and Metoz F 1999 Bioinformatics 15 4 305 308 10 Gouet P and Courcelle E 2002 Bioinformatics 18 5 767 768 11 Gouet P Robert X and Courcelle E 2003 Nucleic Acids Res 31 13 3320 3323 12 Martin A C R and Porter C T 2009 ProFit 3 1 Ed Martin A C R London 13 Schr dinger LLC 2013 The PyMOL Molecular Graphics System www pymol org 14 Waterhouse A M Procter J B Martin D M A Clamp M and Barton G J 2009 Bioinformatics 25 9 1189 1191 15 Berman H M Battistuz T Bhat T N Bluhm W F Bourne P
17. ith the external PhylodendronWeb server You can examine the ENDscript results with the online JalviewLite viewer 14 available in the RESULTS pop up window This tool allows multiple sequence alignment editing visualisation and analysis A secondary structure consensus calculated by ENDscript is included In this consensus the most present conformational state is reported for each residue Finally a downloadable file in Stockholm format allows to import ENDscript results in Jalview Desktop for more information please refer to the Jalview website The Sequences output order option allows the multiple sequence alignment program to present the sequences in the same order as they have been aligned from the guide tree choose aligned They can also be displayed in the same order as they were identified by the BLAST search from the lowest to the highest E value choose input default If the option Display all known structures is activated via the interface default an automatic search is performed to check if a sequence name can be related to a known 3D structure This option has no effect in phase 1 and is functional when a BLAST search is performed against all databases but TREMBL Known secondary structure elements of each matching sequence are displayed in turn in the ESPript figure e ESPript Main role generates a second flat figure with a multiple sequence alignment adorned with secondary structure elements of each homolo
18. molecular 3D visualization program PyMOL installed on the user s computer The first PYMOL representation is named Cartoon o This is aribbon depiction of the PDB query colored as a function of similarity scores calculated from the previous multiple sequence alignment o This color ramping from white low score to red identity allows to quickly locate regions of weak and strong sequence conservation on the structure of the query The second PyMOL representation is named Sausage o It shows a variable tube representation of the Ca trace of the PDB query o In this goal all homologous protein structures are superposed onto the PDB query with ProFit and the size of the tube is proportional to the r m s deviation per residue between Ca pairs o The same white to red color ramping is used to visualize sequence conservation o By combining these two information the user can identify areas of weak and strong structural conservation and correlate this with sequence conservation If applicable the two PyMOL representations can also display via the PYMOL control panel an assortment of supplementary data Biological assembly Multiple NMR models Disulfide bridges Nucleic acids ligands monatomic elements and their contacting residues o Strictly conserved residues o PDB SITES markers Solvent accessible surface mapped with the sequence conservation coloring code O O O O All these features are fully use
19. n based using a pairwise Needleman amp Wunsch sequence alignment as guide If disabled the global sequence alignment of the PDB query with each homologous protein is used instead Enabling this option is recommended because it improves the structural alignment and the calculation of the r m s deviation per residue Disabling this option is only recommended in case of highly similar sequences hits and or for multiple sequence alignments with few gaps For both methods each mobile structure is fitted onto the reference structure the PDB query by using Ca pairs Fitted structures are written in turn in a zip file archive downloadable from the RESULTS pop up window Finally a mean r m s deviation per residue is calculated using all fitted Ca pairs It will be used afterwards in the PyMOL Sausage representation e PyMOL ScriptMaker Main role generates 3D interactive Cartoon and Sausage representations The program PyMOL ScriptMaker gathers all previously calculated information and prepares two PyMOL session files The first PyMOL representation is named Cartoon o This is aribbon depiction of the PDB query colored as a function of similarity scores calculated from the previous multiple sequence alignment o This color ramping from white low score i e equivalent limit 0 7 by default to red identity allows to quickly locate areas of weak and strong sequence conservation on the structure of the query o Asolvent accessi
20. ontact with a ligand i e an automatically kept or a chosen hetero compound see above paragraph o Further information is given with colors A yellow background identifies a non crystallographic contact An orange background identifies an amino acid involved in both a crystallographic and a non crystallographic contact A blue frame identifies an amino acid involved in both a protein protein and a protein ligand contact A red letter identifies a contact lt 3 2 A black letter identifies a contact between 3 2 and 5 0 o Similarities between the PDB query sequence of the chosen chainID chain A by default redefinable in Chain ID option and homologous sequences aligned are rendered by a boxing in color A score is calculated for each column of residues according to a matrix based on physicochemical properties o By default residue names are written in black if score is below 0 7 low similarity they are in red and framed in blue if score is in the range 0 7 1 high similarity they are in white on a red background in case of strict identity o You can switch to other scoring matrices once a first run of ENDscript has been done These setting are available in the Sequence similarities depiction parameters box of the ENDscript form o A percentage of Equivalent residues Equivalent option default can be calculated considering either physicochemical properties HKR are polar positive DE are polar negative STNO are
21. polar neutral AVLIM are non polar aliphatic FYW are non polar aromatic PG C or similarities used in MultAlin IV LM FY NDOEBZ o Risler PAM250 BLOSUM62 and Identity are other possibilities of scoring matrix check Appendix The Risler matrix gives usually an excellent rendering o Sequences can be removed or their order can be changed by using the box Defining group and the following syntax o 1 3 6 10 removes sequences 4 and 5 froma 10 sequences alignment 01 3 2 4 5 swaps the order of sequences 2 and 3 from a 5 sequences alignment o 2 all display sequence 2 first than all the others o Warning query sequence sequence 1 must be kept otherwise ENDscript produces an error With the ESPRIPT button you can export your ENDscript results to the ESPript server There you will have a better grip on the layout and you will be able to edit and enhance your sequence illustrations and save your session on your own computer Phase 3 in details Result produces two interactive 3D PyMOL representations of the PDB query e ProFit Main role superposes all identified homologous structures onto the PDB query So as to superpose each known structure onto the PDB query information on zones of equivalent residues must be known This can be achieved by two distinct methods controlled by the Pairwise 3D structures superposition option If enabled default ProFit performs a 3D superposition of the PDB query with each homologous protei
22. r editable thanks to the PyMOL control panel and publication quality pictures can rapidly be ray traced please refer to PyMOL documentation or PYMOLWiki All the resulting files from phases 1 to 3 can be visualized by a mouse click or retrieved on your computer with the right button Save as option of your browser 3 Phase 1 in details Result A first ENDscript flat figure is produced with information on each monomeric sequence contained in the PDB query o Secondary structure elements and residues in alternate confirmation are shown above the sequence of the PDB query o Accessibility and hydropathy scales intermolecular contacts and possible disulfide bridges are shown below e SPDB Main role checks and cleans the PDB query before entering the ENDscript automated pipeline SPDB and by extension ENDscript supports structure files from the Protein Data Bank or resulting directly from any program conforming to the PDB format o If necessary SPDB re assigns chainIDs from A to Z O to 9 and a to z o First model is kept for multiple NMR models o First conformers are kept for alternate residues o Second oxygen atom of C terminus main chain is removed atom OXT In case of a PDB query with multiple chains the user can specifically select the one he wants to process with ENDscript ChainID option Warning this option is case sensitive ENDscript has the ability to determine and depict contacts between protein residues an
23. renders a comprehensive analysis of primary to quaternary protein structure information in an automated way ENDscript is a tool of choice for biologists and structural biologists which allows generating with a few mouse clicks a set of detailed high quality figures and 3D interactive representations of their proteins of interest ENDscript Web server is fast and convenient o No particular knowledge in bioinformatics is needed to obtain comprehensive and relevant illustrations o The user is guided through the process by tooltips and detailed help topics the present documentation accessible at any time o Thanks to its automated pipeline and a parallel programming ENDscript can deliver results in one click and within one minute Demanding or expert users can modify settings to fine tune ENDscript at their needs o ENDscript produces publication quality illustrations in most common file formats PostScript PDF PNG and TIFF and sizes US letter A4 A3 AO and the gigantic Tapestry format o ENDscript is accessible with any modern Web browser equipped with a PDF reader To take advantage of the 3D interactive representations the PyMOL software free open source or commercial version is required 2 Overview of the ENDscript automated pipeline The ENDscript automated pipeline involves numerous sequence and structure analysis programs o SPDB a homemade program to check residue numbering and chain Ds from the query PDB file o
24. rotein chains at 90 sequence identity PDBAA95 PDBAA with clustering of protein chains at 95 sequence identity PIG Complete proteome from Sus scrofa RAT Complete proteome from Rattus norvegicus SWISSPROT SwissProt database from UniProt Knowledgebase TREMBL TrEMBL database from UniProt Knowledgebase YEAST Complete proteome from Saccharomyces cerevisiae The user can change the threshold for retaining sequence matches identified by the BLAST search E value option default 1e 6 The E value gives an indication of the statistical significance of a given pairwise alignment The lower the E value is or the closer it is to zero the more significant the match is The Discard identical seq option if enabled default allows ENDscript to keep only a single representative sequence when several identical sequence hits are found by the BLAST search This option is useful to discard sequences of proteins with multiple identical chains or when the BLAST search is performed against a redundant database notably PDBAA or TrEMBL e Multiple sequence alignment Main role aligns all the sequence hits identified by the BLAST search with that of the PDB query This multiple sequence alignment can be performed by Clustal Omega default MAFFT MSAProbs or MultAlin Multiple seq alignment program option If Clustal Omega is chosen a dendrogram is calculated It will be used in the RESULTS pop up window to build and view a phylogenetic tree w
25. s an amino acid involved in both a protein protein and a protein ligand contact A red letter identifies a contact lt 3 2 A A black letter identifies a contact between 3 2 A and 5 0 4 Phase 2 in details Result A second ENDscript flat figure is produced It displays o A multiple sequence alignment of homologous proteins colored according to residue conservation o The secondary structure elements of each homologous sequence of known structure To generate this second flat figure the following program pipeline is called by ENDscript e BLAST search Main role finds sequences homologous to that of the PDB query If the option Enablethe BLAST search is activated default a BLAST search is performed against a chosen sequence database defined by the Chooseadatabase option ARATH Complete proteome from Arabidopsis thaliana BOVIN Complete proteome from Bos taurus CAEEL Complete proteome from Caenorhabditis elegans CANFA Complete proteome from Canis familiaris CHICK Complete proteome from Gallus gallus DANRE Complete proteome from Danio rerio DROME Complete proteome from Drosophila melanogaster HUMAN Complete proteome from Homo sapiens MOUSE Complete proteome from Mus musculus PDBAA Sequences derived from PDB protein structures default PDBAA5O PDBAA with clustering of protein chains at 50 sequence identity PDBAA70 PDBAA with clustering of protein chains at 70 sequence identity PDBAA90 PDBAA with clustering of p
26. tively o Residues in an alternate conformation are highlighted by a grey star above sequences o Relative accessibility calculated by DSSP in the previous step is shown by a blue colored bar below sequence White is buried A lt 0 1 cyan is intermediate 0 1 lt A lt 0 4 blue is accessible 0 4 lt A lt 1 and blue with red borders is highly exposed A gt 1 A red box means that relative accessibility is not calculated for the residue because it is truncated Remark only molecules located in the crystallographic asymmetric unit are taken into account by DSSP in its calculation of accessibility Thus you can find highly accessible residues involved in contacts with crystallographic neighbors according to the ESPript figure These residues are in fact buried in the crystal lattice o Hydropathy is calculated from the sequence according to the algorithm of Kyte amp Doolittle 16 with a windows of 3 It is shown by a second bar below accessibility pink is hydrophobic H gt 1 5 grey is intermediate 1 5 lt H lt 1 5 and cyan is hydrophilic H lt 1 5 o Disulphide bridges identified by DSSP in the previous step are shown by green pairs of digits 1 1 2 2 below the bar of hydropathy Intermolecular contacts calculated by CNS in the previous step are displayed along with disulphide bridges below the bar of hydropathy The shortest intermolecular distance is taken for each residue Corresponding contact symbols see a

User Guide - ENDscript 2 - ESPript

Contents

Download Pdf Manuals

Related Search

Related Contents