Home
Facultad de Informática - RUC
Contents
1. Technique antiox non antiox global Precision antiox 25 Global precision ROC Naive bayes 96 0 57 0 63 3 30 1 87 5 0 807 MLP 16 4 98 2 84 9 63 9 82 3 0 867 K star 84 3 93 9 92 3 72 8 93 0 0 967 JRip 65 7 97 0 91 9 81 0 91 6 0 843 Random tree 82 4 94 9 92 8 75 6 93 1 0 886 Random Forest 81 8 96 5 94 1 81 8 94 1 0 947 Naive bayes 80 6 55 5 59 5 25 9 82 7 0 783 MLP 38 9 95 2 86 1 61 2 84 5 0 877 K star 78 4 94 4 91 8 73 2 92 1 0 957 JRip 65 1 96 8 91 7 79 9 91 3 0 836 Random tree 81 8 95 3 93 1 77 3 93 3 0 886 Random forest 81 2 95 7 93 3 78 5 93 4 0 952 Naive bayes 78 4 54 2 58 1 24 9 81 8 0 792 MLP 0 100 83 8 0 83 8 0 644 K star 86 4 93 7 92 5 72 5 93 3 0 97 JRip 68 2 96 5 91 9 78 9 91 6 0 846 Random tree 81 8 94 7 92 6 74 9 92 6 0 882 Random forest 83 6 96 9 94 7 83 9 94 7 0 951 Table 5 Scores obtained by the Random Forest method for each input dataset tested Subset antiox non antiox global Precision antiox Global precision 2 ROC Number attributes Sh 79 3 94 4 91 9 73 2 92 3 0 913 12 Sh embedded 79 0 94 1 91 6 724 92 0 897 6 Sh non embedded 75 0 94 6 91 4 73 0 91 5 0 906 6 Tr 79 9 96 1 93 5 79 9 93 5 0 95 8 Tr embedded 81 8 96 4 94 0 81 3 94 0 0 954 5 TR non embedded 79 9 94 0 91 7 72 1 92 2 0 903 3 X 82 1 96 1 93 8 80 4 93 9 0 948 12 X embedded 82 4 95 7 92 5 78 8 93 7 0 938 6 X non embedded 79 9 95 2 92 7 76 2 92 9 0 926 6 Sh and Tr 81 8 96 5 94 1 81 8 94 1 0 947 20 Sh and Tr embedded 81 2 96 0 93
2. TOMOCOMD ojx Elle Graph Data Metices Process Hol Calculate Graph BEE a Mai oeb al 8 8 18 TE RHAL A H B c si nf G Total C Local l H H Se Pipes Covalent Radi Y N SelectFamiy First Family g 3 C Second Family i 5 C Third Family e MC P d C Fourth Family N C Fiveth Family p C Sixth Family Sy C Seven Family N N C Eighth Family g Sd C Nineth Family C Tenth Family 4 Name Asal File Name Result txt Browse X cancel En 4 Figura 5 Interfaz del TOMOCOMD 1 1 4 MARCH INSIDE MARCH INSIDE es un m todo de c lculo simple pero eficaz para el estudio QSAR en la qu mica medicinal desarrollado por Gonz lez D az et al Figura 6 Se utiliza la teor a de las cadenas de Markov para generar par metros que describen num ricamente la estructura qu mica de los f rmacos y sus dianas moleculares En trabajos de revisi n recientes podemos encontrar ejemplos de la utilizaci n de este programa en la predicci n de agentes anti microbianos y anti parasitarios as como sus dianas moleculares 10 35 66 4 E H 2 HGD UNFINISHEDworks PROTEINS PROT SURFACE MAPS HRV che files LA che Elements Edit Operations Help ojele zax Ble lA ua mB IvB vB vm vms vms m e ma IVA va VIA via via Nue AA A B c D E F G H I K L M N P Q R s T v w Position 1439 753 Figura 6 Interfaz gr fica de la aplicaci n MARCH INSIDE 1 1 5 E Calc E Calc v 1 1 1999
3. Althaus I W Chou J J Gonzales A J LeMay R J Deibel M R Chou K C Kezdy F J Romero D L Thomas R C Aristoff P A et al 1994 Steady state kinetic studies with the polysulfonate U 9843 an HIV reverse transcriptase inhibitor Experientia 50 23 28 Althaus I W Chou K C Lemay R J Franks K M Deibel M R Kezdy F J Resnick L Busso M E So A G Downey K M Romero D L Thomas R C Aristoff P A Tarpley W G Reusser F 1996 The benzylthio pyrimidine U 31 355 a potent inhibitor of HIV 1 reverse transcriptase Biochem Pharmacol 51 743 750 Berman H M Westbrook J Feng Z Gilliland G Bhat T N Weissig H Shindyalov I N Bourne P E 2000 The Protein Data Bank Nucleic Acids Res 28 235 242 BieliNska Wa z D Nowak W Wa z P Nandyc A Clark T 2007 Distribution moments of 2D graphs as descriptors of DNAsequences Chem Phys Lett 443 408 413 Breiman L 1996 Bagging predictors Mach Learn 24 123 140 Breiman L 2001 Random Forest Mach Learn 45 5 32 Cevenini E Bellavista E Tieri P Castellani G Lescai F Francesconi M Mishto M Santoro A Valensin S Salvioli S Capri M Zaikin A Monti D de Magalh es J P Franceschi C 2010 Systems biology and longevity an emerging approach to identify innovative anti aging targets and strategies Curr Pharm Des 16 802 813 Chipman H A George E I McCulloch R E
4. 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 Prado Prado F Escobar Cubiella M Garc a Mera X Review of Bioinformatics and QSAR studies of B secretase inhibitors Current Bioinformatics 2011 6 1 3 15 Garc a I Fall Y G mez G Trends in Bioinformatics and Chemoinformatics of Vitamin D analogues and their protein targets Current Bioinformatics 2011 6 1 16 24 Ivanciuc T Ivanciuc O Klein DJ Network QSAR with Reaction Poset Quantitative Superstructure Activity Relationships QSSAR for PCB Chromatographic Properties Current Bioinformatics 2011 6 1 25 34 Chis O Dumitru O Concu R Shen B Reviewing Yeast Network and report of new Stochastic Credibility cell cycle models Current Bioinformatics 2011 6 1 35 43 Bhattacharjee B Jayadeepa RM Banerjee S Joshi J Middha SK Mole JP et al Review of Complex Network and Gene Ontology in pharmacology approaches Mapping natural compounds on potential drug target Colon Cancer network Current Bioinformatics 2011 6 1 44 52 Duardo Sanchez A Patlewicz G Gonz lez D az H A Review of Network Topological Indices from Chem Bioinformatics to Legal Sciences and back Current Bioinformatics 2011 6 11 53 70 Wan SB Hu LL Niu S Wang K Cai YD Lu WC et al Identification of multiple subcellular locations for proteins in budding yeast Current Bioinformatics 2011 6 1 71 80 Speck Planche A Cordeiro MNDS
5. The new method uses graphical information processing theory which has never previously used in this kind of problem The results can be qualified as notable compared with the state of the art ARTICLE INFO ABSTRACT Article history Received 9 July 2012 Received in revised form 17 September 2012 Accepted 2 October 2012 Available online 29 October 2012 Keywords Multi target QSAR Star Graph Topological indices Antioxidant protein Aging and life quality is an important research topic nowadays in areas such as life sciences chemistry pharmacology etc People live longer and thus they want to spend that extra time with a better quality of life At this regard there exists a tiny subset of molecules in nature named antioxidant proteins that may influence the aging process However testing every single protein in order to identify its properties is quite expensive and inefficient For this reason this work proposes a model in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity The graph obtained as a representation will help us describe the complex system by using topological indices More specifically in this work Randi s Star Networks have been used as well as the associated indices calculated with the S2SNet tool In order to simulate the existing proportion of antioxidant p
6. im 32 dim i A j po Si te 6 m 1 dom where q and qj are the electronic charges for amino acids ith aa and the jth aa and the neighborhood relationship truncation function aij 1 is turned on if these amino acids participate in a peptidic hydrogen bond or dij lt dcut off 5 67 In this regard the trun cation of the molecular field is usually applied to simplify all the calculations in large biological systems The distance dij is the Euclidean distance between the C atoms of the two amino acids and doj the distance between the amino acid and the center of charge of the protein Both kinds of distances were derived from the x y and z coordinates of the amino acids collected from the protein PDB files All calculations were carried out with our in house software MARCH INSIDE All water molecules and metal ions were removed for the calculation 67 2 2 LDA models LDA is frequently used for classification prediction problems in physical anthropology but it is unusual to find examples where researchers consider the statistical limitations and assumptions required for this technique In this work all LDA models have been trained with the software STATISTICA 6 09 for which our labora tory holds rights of use 76 In LDA we use several variable selec tion techniques to seek the model i All Effects include all parameters ii Forward stepwise iii Forward entry iv Backward stepwise v Backward removal and
7. 0 11667 0 11667 0 11667 0 11824 0 11824 0 11824 0 11824 0 11824 0 11824 0 11824 0 11824 0 13487 0 13605 0 13333 0 13333 0 13333 0 13333 0 13333 0 13514 0 13514 0 13514 0 13514 0 13514 0 13514 0 13514 0 13514 0 13158 0 13265 0 13000 0 13000 0 13000 0 13000 0 13000 0 13176 0 13176 0 13176 0 13176 0 13176 0 13176 0 13176 0 13176 0 14803 0 15306 0 15000 0 15000 0 15000 0 15000 0 15000 0 15203 0 15203 0 15203 0 15203 0 15203 0 15203 0 15203 0 15203 002934 En 1963 el matem tico Stanistaw M Ulam descubri ciertos aspectos interesantes relacionados con la disposici n que adoptan los n meros primos al colocar los n meros naturales en forma de una espiral Luego esta disposici n tom mucho auge en la generaci n y visualizaci n de im genes Para construir la espiral se colocan los n meros en una rejilla de cuadr culas comenzando por 1 en el centro y luego los dem s formando una espiral cuadrada seg n la Figura 23 En matem ticas esta representaci n es un m todo simple de graficar n meros con el que se revelen aspectos ocultos y muy interesantes de las series y secuencias num ricas En el estudio de las mol culas esta representaci n en espiral ha sido asociada en muchos trabajos encaminados a representar secuencias de nucle tidos de ADN divididos en cuatro clases A T G y C 52 10 100 99 9 97 96 9 94 9S 92
8. 0 8 04 1 0 02 1 2 0 2 01 00 01 02 03 04 05 06 07 08 09 10 11 4 3 2 2 0 1 2 3 4 Sensitivity Figure 3 ROC curve analysis of the DNA cleavage mediated anticancer activity model protein orbits even if these motifs were funded at the end of a helices at the beginning of the beta strands and in the turn random coil regions but not in the middle of helices strands We based this study on the idea that the presence of the ATCUN motif is necessary but not enough for the anticancer activity Thus the entire 3D structure of the protein participates in the protein activity because it can influence the accessibility of the ATCUN like motif the supra molecular recognition of the DNA the subcellular location of the proteins active site hydrophobicity or other factors It may also explain the additional positive but lower contribution of 0 05 for the unitary increment in inner spectral moments i The desirability profiles present the levels of the predictor variables z2 f and To i that produce the most desirable predicted DNA cleavage mediated anticancer responses see Figure 2 We can observe that all proteins with 5 lt xa f lt 5 standardized values are expected to present higher DNA cleavage mediated anticancer activity by accommodation in their backbone of ATCUN motifs for lower values of zo i the region encircled by a white dashed line is wider on the left than on the right Standardized residuals Figure 4 Model robu
9. 1300 1200 1100 1000 900 800 p 700 No of cases 500 400 exicana 300 f S scrofa tl d 100 A rti G intestinalis S cerevisiae T T T T T T H Sapienss ME masis ME R noverigicus hah sd Figure 4 Histogram of number of PPI and non PPI cases studied by organism the order of organism in the x axis is by first time of appearance in the list of Supporting Information Results and Discussion Several researchers have demonstrated the high perfor mance of different types of computational classifiers in structure function relationship studies ranging from low weight molecules to protein or protein protein complexes and based on different algorithms see for instance the works of Ivanciuc about Machine Learning 7 or the works of Cai and Chou et al 4 1979 8 with different classifiers In 1186 Journal of Proteome Research e Vol 9 No 2 2010 particular the Linear Neural Network LNN algorithm the simpler type of ANN was used here to train different linear models based on different combinations of parameters Table 1 depicts the results for the best models found The profile of the ANN model was specified with a simple notation as follows ANN type Ni Nis Nui Nu NosNo The ANN types presented in addition to LNN are multi layer perceptron MLP probabilistic neural network PNN and radial basis Trypano PPI technical notes Table 1 Sum
10. 1998 An introduction to Classification and Regression Tree CART analysis J Am Stat Assoc 935 948 Chou K C 1989 Graphical rules in steady and non steady enzyme kinetics J Biol Chem 264 12074 12079 Chou K C 1990 Review applications of graph theory to enzyme kinetics and protein folding kinetics Steady and non steady state systems Biophys Chem 35 1 24 Chou K C Forsen S 1980 Graphical rules for enzyme catalyzed rate laws Biochem J 187 829 835 Chou K C Kezdy F J Reusser F 1994 Review steady state inhibition kinetics of processive nucleic acid polymerases and nucleases Anal Biochem 221 217 230 Chou K C Liu W M 1981 Graphical rules for non steady state enzyme kinetics J Theor Biol 91 637 654 Chou K C Zhang CT 1992 Diagrammatization of codon usage in 339 HIV proteins and its biological implication AIDS Res Hum Retroviruses 8 1967 1976 Chou K C Zhang C T Elrod D W 1996 Do antisense proteins exist J Protein Chem 15 59 61 de Magalhaes J P Curado J Church G M 2009 Meta analysis of age related gene expression profiles identifies common signatures of aging Bioinformatics 25 875 881 de Magalhaes J P Finch C E Janssens G 2010 Next generation sequencing in aging research emerging applications problems pitfalls and possible solu tions Ageing Res Rev 9 315 323 de Magalh es J P 2011 The biology of ageing a primer In I
11. 2003 On a four dimensional representation of DNA primary sequences J Chem Inf Model 43 532 539 Randi M Zupan J Vikic Topic D 2007 On representation of proteins by star like graphs J Mol Graph Model 290 305 Rappin N Dunn R 2006 wxPython in Action Manning Publications Co Green wich CT Reg ly M rei A Bereczky M Arat G Telek G Pallai Z Lugasi A Antal M 2007 Nutritional and antioxidant status of colorectal cancer patients Orv Hetil 148 1505 1509 Riera Fern ndez I Mart n Romalde R Prado Prado F Escobar M Munteanu C Concu R Duardo Sanchez A Gonz lez D az H 2012 From QSAR models of drugs to complex networks state of art review and introduction of new Markov spectral moments indices Curr Top Med Chem 8 927 960 Rivero D Fernandez Blanco E Dorado J Pazos A 2011 Using recurrent ANNs for the detection of epileptic seizures in EEG signals Evolutionary Computa tion CEC 2011 IEEE Congress on IEEE pp 587 592 Shindyalov I N Bourne P E 1998 Protein structure alignment by incremental combinatorial extension of the optimum path Protein Eng 11 739 747 Skurichina M Duin R P W 2002 Bagging boosting and the random subspace method for linear classifiers Pattern Anal Appl 5 121 135 Todeschini R Consonni V 2002 Handbook of Molecular Descriptors Wiley VCH Vapnik V N 1995 The Nature of Statistical Learning Theo
12. 36 39 which predict the function of proteins from structural parameters or explore protein structures are good examples in this regard In any case to the best of our knowledge there is no web server available in the literature or at least a theoretical method to predict unique pPPC in Plasmodium and not present in humans or other parasites or hosts based on the 3D structure of the two proteins involved in pPPIs or non PPIs interactions Besides Gonz lez D az et al introduced the method called MARkovian CHemicals IN SIlico DEsign MARCH INSIDE 1 0 for the computational design of small sized drugs In successive studies we have extended this method to perform fast calculation of 2D and 3D alignment free numeric parameters to describe RNA secondary structures based on molecular vibration information 40 and 3D structure of proteins based on Van der Waals 41 or electrostatic interactions 42 Recently the method has been renamed as MARkov CHains Invariants for Networks SImulation amp DEsign MARCH INSIDE 2 0 The approach uses a Markov Chain model MCM to calculate parameters of small sized and also complex chemical structures 43 45 To this end MARCH INSIDE describes the system as a stochastic matrix of interactions and or transitions between the parts of the system and associates this matrix to a graph or complex network representation of this system at the same time This describes more adequately the broad uses of the met
13. BMC Bioinf 2006 7 53 Chua H N Ning K Sung W K Leong H W Wong L Using indirect protein protein interactions for protein complex predic tion Bioinform Comput Biol 2008 6 3 435 66 Smith G R Sternberg M J Prediction of protein protein interac tions by docking methods Curr Opin Struct Biol 2002 12 1 28 35 Shen H B Chou K C PseAAC A flexible web server for generating various kinds of protein pseudo amino acid composi tion Anal Biochem 2008 373 2 386 8 Shen H B Chou K C Nuc PLoc a new web server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM Protein Eng Des Sel 2007 20 11 561 7 Chou K C Shen H B MemType 2L A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse PSSM Biochem Biophys Res Commun 2007 360 2 339 45 Chou K C Shen H B Cell PLoc a package of Web servers for predicting subcellular localization of proteins in various organisms Nat Protoc 2008 3 2 153 62 Chou K C Prediction of G protein coupled receptor classes J Proteome Res 2005 4 4 1413 8 Chou K C Elrod D W Bioinformatical analysis of G protein coupled receptors J Proteome Res 2002 1 5 429 33 Chou K C Elrod D W Prediction of enzyme family classes J Proteome Res 2003 2 2 183 90 Chou K C Shen H B Predicting eukaryotic protein subce
14. C Chou J Proteome Res 2005 4 1413 1418 C Chou and D W Elrod J Proteome Res 2002 1 429 433 C Chou and D W Elrod J Proteome Res 2003 2 183 190 C Chou and H B Shen J Proteome Res 2006 5 1888 1897 C Chou and H B Shen J Proteome Res 2006 5 3420 3428 C Chou Curr Proteomics 2009 6 262 274 C Chou J Theor Biol 2011 273 236 247 ARK AAR This journal is The Royal Society of Chemistry 2012 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A 50 51 52 53 54 55 56 57 58 59 60 6 pi 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 L Santana E Uriarte H Gonz lez D az G Zagotto R Soto Otero and E Mendez Alvarez J Med Chem 2006 49 1149 1156 H Gonz lez D az R R de Armas and R Molina Bioinformatics 2003 19 2079 2087 G Aguero Chapin J Varona Santos G A de la Riva A Antunes T Gonzalez Villa E Uriarte and H Gonzalez Diaz J Proteome Res 2009 8 2122 2128 H Gonz lez D az L Saiz Urra R Molina L Santana and E Uriarte J Proteome Res 2007 6 904 908 R Concu M A Dea Ayuela L G Perez Montoto F Bolas Fernandez F J Prado Prado G Podda E Uriarte F M Ubeira and H Gonzalez Diaz J Proteome Res 2009 8 4372 4382 L Santana H Gonzal
15. Comb Chem High Throughput Screening 2011 14 450 474 E Demchuk P Ruiz S Chou and B A Fowler Toxicol Appl Pharmacol 2011 254 192 197 J Devillers and A T Balaban Topological Indices and Related Descriptors in QSAR and QSPR Gordon and Breach The Netherlands 1999 10 S Vilar H Gonzalez Diaz L Santana and E Uriarte J Theor Biol 2009 261 449 458 11 H Wei C H Wang Q S Du J Meng and K C Chou Med Chem 2009 5 305 317 12 J Wang X Y Wang M Shu Y Q Wang Y Lin L Wang X M Cheng and Z H Lin Protein Pept Lett 2011 18 956 963 13 X Hou J Du H Fang and M Li Protein Pept Lett 2011 18 440 449 14 O Ivanciuc T Ivanciuc D Cabrol Bass and A T Balaban J Chem Inf Comput Sci 2000 40 631 643 15 M Randic and A T Balaban J Chem Inf Comput Sci 2003 43 532 539 Zo oo o This journal is The Royal Society of Chemistry 2012 16 M Randic J Zupan and M Novic J Chem Inf Comput Sci 2001 41 1339 1344 17 M Randic J Zupan and D Vikic Topic J Mol Graphics Modell 2007 26 290 305 18 C R Munteanu E Fernandez Blanco J A Seoane P Izquierdo Novo J A Rodriguez Fernandez J M Prieto Gonzalez J R Rabunal and A Pazos Curr Pharm Des 2010 16 2640 2655 19 K C Chou Biophys Chem 1990 35 1 24 20 K C Chou J Biol Chem 1989 264 12074 12079 2 K C Chou and S Forsen Biochem J
16. Finalmente el modelo ha predicho por primera vez la funci n de divisi n del ADN de las prote nas de los par sitos pat genos Nosotros hemos predicho posibles prote nas con actividad ATCUN con una probabilidad superior al 99 en nueve familias de par sitos como Trypanosoma Plasmodium Leishmania 0 Toxoplasma La distribuci n de las funciones biol gicas de las prote nas ATCUN predichas ha sido la siguiente oxidorreductasas 70 5 prote nas de se alizaci n 62 5 liasas 58 2 prote nas de la membrana 45 5 ligasas 44 4 hidrolasas 41 3 transferasas 39 2 prote nas de adhesi n celular 34 5 metal binders 33 5 prote nas de traducci n 25 0 prote nas de transporte molecular 16 7 prote nas estructurales 9 1 y isomerasas 8 2 El modelo est implementado en http bio aims udc es ATCUNPred php Figura 33 Un exjemplo de resultado para las prote nas 1AZP 114M 1B0U es ATCUNpred Bio AIMS ATCUN DNA cleavage protein activity Prediction by using MARCH INSIDE and LDA based on electrostatic spectral moments Accuracy of 91 32 Results http bio aims udc es Results 24293517444a6099830 ATCUNpred calc2 txt 2013 04 21 21 57 26 PDB ATCUN Prediction lAZP 0 28 114M 66 01 1BOU 80 97 65 2 2 4 LIBPpred Predicci n de prote nas que interacciona con los lipidos LIBP Pred Web Server for Lipid Binding Proteins using Structural Network Parameters PDB Mining of Human Cancer Biomarkers
17. K Y Heath T D 1992 Mixtures of tight binding enzyme inhibi tors Kinetic analysis by a recursive rate equation Anal Biochem 200 68 73 Li Y H Dong M Q Guo Z 2010 Systematic analysis and prediction of longevity genes in Caenorhabditis elegans Mech Ageing Dev 131 700 709 McLachlan G J Do K A Ambroise C 2004 Analyzing Microarray Gene Expres sion Data Wiley Munteanu C R Fernandez Blanco E Seoane J A Izquierdo Novo P Rodriguez Fernandez J A Prieto Gonzalez J M Rabunal J R Pazos A 2010 Drug discovery and design for complex diseases through QSAR computational methods Curr Pharm Design 16 2640 2655 Munteanu C R Magalh es A L Uriarte E Gonz lez D az H 2009 Multi target QPDR classification model for human breast and colon cancer related proteins using star graph topological indices J Theor Biol 257 303 311 OECD 2011 http stats oecd org index aspx DataSetCode HEALTH STAT Prado Prado F J Gonz lez D az H Martinez de la Vega O Ubeira F M Chou K C 2008 Unified QSAR approach to antimicrobials Part 3 first multi tasking QSAR model for input coded prediction structural back projection and complex networks clustering of antiprotozoal compounds Bioorg Med Chem 16 5871 5880 Qi X Q Wen J Qi Z H 2007 New 3D graphical representation of DNA sequence based on dual nucleotides J Theor Biol 249 681 690 Randi M Balaban A T
18. M Kay L E The ATCUN domain as a probe of intermolecular interactions application to calmodulin peptide complexes J Am Chem Soc 2002 124 47 14002 3 Singh R K Sharma N K Prasad R Singh U P DNA cleavage study using copper ID GlyAibHis a tripeptide complex based on ATCUN peptide motifs Protein Pept Lett 2008 15 1 13 9 Melino S Gallo M Trotta E Mondello F Paci M Petruzzelli R Metal binding and nuclease activity of an antimicrobial peptide analogue of the salivary histatin 5 Biochemistry 2006 45 51 15373 83 Harford C Sarkar B Amino Terminal Cu II and Ni ID Binding ATCUN Motif of Proteins and Peptides Metal Binding DNA Cleavage and Other Properties Acc Chem Res 1997 30 3 123 30 Sankararamakrishnan R Verma S Kumar S ATCUN like metal binding motifs in proteins identification and characterization by crystal structure and sequence analysis Proteins 2005 58 1 211 21 Devillers J Balaban A T Topological Indices and Related Descriptors in QSAR and QSPR Gordon and Breach The Nether lands 1999 Zbilut J P Giuliani A Colosimo A Mitchell J C Colafrance schi M Marwan N Webber C L Jr Uversky V N Charge and hydrophobicity patterning along the sequence predicts the folding mechanism and aggregation of proteins a computational approach J Proteome Res 2004 3 6 1243 53 Shen B Bai J V M Physicochemical feature
19. Sensitivity Fig 5 ROC curve for pPPC predictor In particular the model LNN 4 4 1 1 is the simplest model found with the highest levels of Sensitivity 92 6 Specific ity 92 2 and Accuracy 92 3 in the training set These values are excellent considering that this predictor uses only two molecular descriptors of the PPI pair which is a very complex structure in chemical terms to fit a large data set of 581 pPPIs and 3395 npPPls The profile 4 4 1 1 indicates that this model assign the values of four input variables to four input neurons that perform a weighed sum and assigns the result to one output neuron which gives the final result of the case classification according to the threshold value that has been optimized In addition the model LNN 4 4 1 1 also presented a higher levels of Sensitivity 90 2 Specific ity 90 4 and Accuracy 90 4 in external validation test set see Table 1 In Fig 4 we illustrate the topology of this LNN network compared with a non linear ANN Interestingly four variables dam 4 s 4 t and s t out of more than 30 parameters calculated appear in many models This fact indicates that the difference between the electrostatic entropy is very important not only for PPI interactions in general but also to discriminate a unique complex present in Plasmodium pPPIs On the other hand the product and average invariant types 0 R and 6 R do not appear to be relevant
20. The protein structures were downloaded from PDB using the following schemes for PDB database search i introducing as input parameter the name of the parasite specie Trypanasome in the search item called source organism for positive cases or ii introducing the PDB IDs for all the proteins contained in the list reported in the article of Dobson and Doig The positive cases TPPI are those protein protein pairs that form stable complex that have been structurally characterized 3D structure in Trypanosome species The list of negative cases non TPPI search scheme b contains enzymes and other protein complexes present in humans and many other organisms including other parasites see Figure 4 that are not present in Trypanosome species The data set was composed by 7866 pairs of proteins 1023 TPPIs and 6823 non TPPIs from more than 20 organisms including parasites and human or cattle hosts Detailed information about the PDB ID the values of the electrostatic potential indices the correspond ing observed classification and the predicted classification for each TPPI or non TPPI pair are given in the Supporting Information Journal of Proteome Research e Vol 9 No 2 2010 1185 technical notes Linear 90 90 1 1 Rodriguez Soca et al RBF 17 17 1 1 1 PNN 90 90 11512 2 2 1 MLP 21 21 30 1 1 Figure 3 Illustrative examples of the topology used for some of the ANN models trained
21. We also validated the linear model by means of a ROC curve analysis see Fig 5 to demonstrate that there is a linear and not an indirect non linear relationship between our indices and the clas sification of pPPCs 106 The values of the area under the ROC curve for this model are 0 95 and 0 96 very close to 1 the highest possible value and notably different from 0 5 the typical value of a random classifier This kind of analysis is an accepted tool in Bioinformatics to demonstrate which classification methods outperform the other methods e g the study carried out by Xu and Du related to PPIs 107 or the work of Mahdavi and Lin 108 This first search points to a linear instead of non linear relationship between pPPI prediction and AR values giving additional proofs of the validity of our methodology For instance in Table 1 we can see that more complicated models with non linear profiles do not improve the linear model and give even worse results sometimes 3 3 Classification Tree CT models Last considering that non linear ANN did not notably improved LDA we used the variables pre selected by LDA as inputs for a Classification Tree CT analysis With complete data sets LDA may be a simpler and sometimes better choice However the testing of data prior to analysis is necessary and CTs are recom mended either as a replacement for LDA or as a supplement whenever data do not meet relevant assumptions 109 Table 1 also depic
22. Z H New 3D graphical representation of DNA sequence based on dual nucleotides J Theor Biol 2007 249 4 681 90 Chou K C Zhang C T Elrod D W Do antisense proteins exist J Protein Chem 1996 15 1 59 61 Chou K C Zhang C T Diagrammatization of codon usage in 339 HIV proteins and its biological implication AIDS Res Hum Retroviruses 1992 8 1967 76 Zhang C T Chou K C Analysis of codon usage in 1562 E Coli protein coding sequences J Mol Biol 1994 238 1 8 Ramos de Armas R Gonz lez D az H Molina R Uriarte E Markovian Backbone Negentropies Molecular descriptors for protein research I Predicting protein stability in Arc repressor mutants Proteins 2004 56 4 715 23 Gonz lez D az H Uriarte E Biopolymer stochastic moments I Modeling human rhinovirus cellular recognition with protein surface electrostatic moments Biopolymers 2005 77 5 296 303 Gonz lez D az H Aguero G Cabrera M A Molina R Santana L Uriarte E Delogu G Castanedo N Unified Markov thermo dynamics based on stochastic forms to classify drugs considering molecular structure partition system and biological species distribution of the antimicrobial G1 on rat tissues Bioorg Med Chem Lett 2005 15 3 551 7 Gonz lez D az H Cruz Monteagudo M Molina R Tenorio E Uriarte E Predicting multiple drugs side effects with a general drug target interaction thermodyn
23. assays in order to confirm LIBP function Each time we use the PDB mining key the server updates the prediction for all new PDB files present in the last version of the PDB synchronized with LIBPpred We have predicted S LBP values for a total of 2693 proteins selected to have unknown function or only hypothetical function predicted and low sequence homology in current PDB release A total of 552 out of 2693 proteins studied 20 5 were predicted as possible LIBPs with S LIBP gt 50 However if we restrict the criteria to S LIBP gt 55 in order to discard unclassified outputs the results shrink to 271 possible LIBPs 10 1 These are in any case weak This journal is The Royal Society of Chemistry 2012 S LIBP vs Mw 120 100 E 80 60 40 20 0 T T T T 1 0 100000 200000 300000 400000 500000 600000 700000 Fig 4 Scatter plot of S LIBP vs molecular weight Mw of the protein complex criteria somehow if we use a more restrictive criterion for this LDA classifier with a cut off of 75 our LIBP Pred found only 27 possible LIBPs 196 Another important result is the demonstration that LIBP Pred predictions are not molecular weight dependent biased see Fig 4 This scatter plot shows that there are no apparent linear relationships between S LIBP and M with a correlation coefficient of only R 0 079 between both properties Consequently we can conclude that LIBP Pre
24. como tambi n en sus dianas moleculares Los descriptores moleculares juegan un papel fundamental en los estudios QSPR QSAR En esta secci n vamos a presentar algunos programas que se utilizan para el c lculo de descriptores moleculares tanto TIs como otros 56 DRAGON MoDesLab TOMO COMD MARCH INSIDE E Calc y CODESSA PRO 6 Dragon 6 0 File View Analysis Settings is a 3 El a di th i tes Molecules Descriptors External variables View results Export Viewer Statistics Correlation PCA Figura 2 Interfaz gr fica de la aplicaci n Dragon 6 Virtual Computational Chemistry Laboratory Welcome to the E Dragon 1 0 program 8 VCCLAB Data parameters v Login submit your task Name of this task upload cfg save cfg Input FORMAT MDL sdf Y CONVERT moleculesto 3D do not convert v List of molecular DESCRIPTORS IV constitutional descriptors 48 IV topological descriptors 119 IV walk and path counts 47 IV connectivity indices 33 Y information indices 47 V 2D sutocorelstions 96 IV edge adjacency indices 107 IV Burden eigenvalues 64 lv topological charge indices 21 lv eigenvalue based indices 44 IV Randic molecular profiles 41 IV geometrical descriptors 74 Y RDF descriptors 150 IV 3D MoRSE descriptors 160 Y WHIM descriptors 99 V GETAWAY descriptors 197 IV functional group counts 154 JV atom centred fragments 120 IV charge descriptors 14 IV molecular properties 31 Click to see or edit da
25. sitos medicamentos antiparasitarios de predicci n Los datos fueron procesados por el LDA y el modelo clasific correctamente un 93 62 1160 de los 1239 casos en entrenamiento La validaci n del modelo se llev a cabo utilizando la serie de predicci n externa clasific ndose correctamente 573 de los 607 94 4 casos La ecuaci n del modelo es la siguiente Actv 3 86 m s Ca 3 7 1 4 8 Cops 53 55 70 s X 50 92 zr s X 2 62 zz s H Het 3 12 7 s H Het 2 37 Re 0 73 1 0 46 p 0 001 6 donde Rc es el coeficiente de correlaci n can nica A es la estad stica de Wilk y p es el nivel de error En esta ecuaci n las probabilidades absolutas m calculadas se refieren a moj s Csp amp sp2 todos los tomos de carbono insaturados tomos sp y sp2 y todos los tomos colocados a una distancia d 5 de ellos n s Csat todos los tomos de carbono saturados ni s X todos los tomos de hal geno zo s H Het todos los tomos de hidr geno unidos a un hetero tomo N O o S Prado Prado et al 78 han utilizado la teor a de las Cadenas de Markov para calcular nuevos momentos espectrales para m ltiples dianas con el fin de ajustar un modelo mt QSAR para 500 medicamentos analizados en la literatura contra 16 especies de par sitos y otros 207 f rmacos no analizados en la literatura Los datos fueron procesados por el LDA clasificando los medicamentos como activos o inactiv
26. 1980 187 829 835 22 K C Chou and W M Liu J Theor Biol 1981 91 637 654 23 P Kuzmic K Y Ng and T D Heath Anal Biochem 1992 200 68 73 24 I W Althaus J J Chou A J Gonzales M R Diebel K C Chou F J Kezdy D L Romero P A Aristoff W G Tarpley and F Reusser Biochemistry 1993 32 6548 6554 25 I W Althaus J J Chou A J Gonzales M R Diebel K C Chou F J Kezdy D L Romero P A Aristoff W G Tarpley and F Reusser J Biol Chem 1993 268 6119 6124 26 I W Althaus J J Chou A J Gonzales R J LeMay M R Deibel K C Chou F J Kezdy D L Romero R C Thomas and P A Aristoff and et al Experientia 1994 50 23 28 27 I W Althaus K C Chou R J Lemay K M Franks M R Deibel F J Kezdy L Resnick M E Busso A G So K M Downey D L Romero R C Thomas P A Aristoff W G Tarpley and F Reusser Biochem Pharmacol 1996 51 743 750 28 K C Chou F J Kezdy and F Reusser Anal Biochem 1994 221 217 230 29 X Q Qi J Wen and Z H Qi J Theor Biol 2007 249 681 690 30 K C Chou C T Zhang and D W Elrod J Protein Chem 1996 15 59 61 31 K C Chou and C T Zhang AIDS Res Hum Retroviruses 1992 8 1967 1976 32 C T Zhang and K C Chou J Mol Biol 1994 238 1 8 33 Y Rodriguez Soca C R Munteanu J Dorado J Rabu al A Pazos and H Gonzalez Diaz Polymer 2010 51 264
27. 273 34 H Gonzalez Diaz L Muino A M Anadon F Romaris J Prado Prado C R Munteanu J Dorado A P Sierra M Mezo M Gonzalez Warleta T Garate and F M Ubeira Mol BioSyst 2011 7 1938 1955 35 H Gonzalez Diaz F Prado Prado X Garcia Mera N Alonso P Abeijon O Caamano M Yanez C R Munteanu A Pazos M A Dea Ayuela M T Gomez Munoz M M Garijo J Sansano and F M Ubeira J Proteome Res 2011 10 1698 1718 36 H Gonzalez Diaz F Prado Prado E Sobarzo Sanchez M Haddad S Maurel Chevalley A Valentin J Quetin Leclercq M A Dea Ayuela M Teresa Gomez Munos C R Munteanu J Jose Torres Labandeira X Garcia Mera R A Tapia and F M Ubeira J Theor Biol 2011 276 229 249 37 P Riera Fernandez C R Munteanu N Pedreira Souto R Mart n Romalde A Duardo Sanchez and H Gonz lez D az Curr Bioinf 2011 6 94 121 38 Z C Wu X Xiao and K C Chou J Theor Biol 2010 267 29 34 39 K C Chou Curr Drug Metab 2010 11 369 378 40 K C Chou W Z Lin and X Xiao Nat Sci 2011 3 862 865 openly accessible at http www scirp org journal NS 4 G P Zhou J Theor Biol 2011 284 142 148 42 G P Zhou Protein Pept Lett 2011 18 966 978 43 H Gonz lez D az G Ferino G Podda and E Uriarte Electron Conf Synth Org Chem 2007 11 G1 1 10 44 S Vilar H Gonzalez Diaz L Santana and E Uriarte J Comput Chem 2008 29 261
28. 36 72 falciparum 2FUOA 34 59 falciparum 2KDNA 33 88 Other parasites Toxoplasma gondii 2FAZB 40 76 Trypanosoma brucei 2Q0XA 55 26 Trypanosoma brucei 2AMHA 52 87 Trypanosoma brucei 2K9XA 40 33 Trypanosoma cruzi IYZVA 44 88 L donovani 858 Mol BioSyst 2012 8 851 862 My PDB ID S My Leishmania major 211122 31 1X9GA 51 23 22701 7 73780 59 3HA4A 51 12 141446 42 124991 81 ITC5A 50 84 87 100 86192 3M3IA 49 68 202852 38675 12 1YIXA 48 91 43851 29 93 492 3HA4B 48 2 141446 42 36346 29 3S40A 47 6 37372 82 18979 9 1YF9A 46 45 58695 01 86659 06 3KSVA 44 96 16307 28 71228 68 2ARIA 44 59 20443 59 42903 78 1Y63A 43 68 22570 21 44007 6 3LJNA 40 59 41122 7 61467 58 1YQFA 40 5 138256 19 90765 37 IR75A 35 79 16322 39 Homo sapiens 40489 07 2WM3A 62 66 34893 97 82299 41 2GTRA 61 04 87455 7 25964 6 2EC4A 55 37 20134 9 76870 25 2HV6B 53 56 73663 31 76870 25 2HV6A 53 39 73663 31 137232 12 2FBMA 53 3 96749 27 59869 23 2Q4KA 53 13 82801 8 19876 3 216TA 52 59 66672 09 61130 2095A 51 35 44000 2 22520 9 2P2LA 50 99 64220 55 46076 4 2DB9A 50 83 16665 1 18591 19 INZNA 50 75 14955 85 14333 7 1X53A 50 31 16177 2 51476 48 2L20A 50 31 10125 7 20439 7 2095B 50 11 44000 2 62364 8 2P5XA 49 85 51829 34 77780 69 2K07A 48 97 20557 7 153103 19 2DLXA 48 48 17337 5 18188 8 1V9VA 48 01 12426 2 12328 2 IWRYA 47 96 13124 5 Caenorhabditis elegans 43101 09 1XKQA 74 13 122794 91 74903 39 1PULA 46 1 13747 8 22994 83 1T9FA 38 98 20692 85 12005 6 ITOVA 38 61 1
29. 50 DNA cleavage 2 cy y ey 7r O fal lAZP 114M wes 1B0U DNA cleavage ATCUN activity score pi electrostatic spectral moments O amino acid orbits Note For the sake of simplicity in order to avoid the calculation of ATCUN m ETE activit Mahalanobis distance an approximate E y classification percentage is calculated as sc1 2 sc1 2 4 sc2 2 where sci and sc2 are the ATCUN and Tool MARCH INSIDE Python version X A A Data RCSB PDB Predict non ATCUN scores DNA cleavage Figura 33 Herramienta online ATCUNPred El desarrollo de m todos que pueden predecir la actividad biol gica mediada del metal basado s lo en la estructura 3D de las prote nas no enlazadas con el metal se ha convertido en un objetivo de gran importancia Este trabajo est dedicado a los motivos tipo terminal amino Cu II y Ni ID binding ATCUN que participan en la divisi n del ADN y tienen actividad antitumoral 64 Hemos calculado aqu por primera vez los momentos espectrales electrost ticos para la informaci n proteica 3D de 415 prote nas diferentes incluyendo 133 posibles prote nas ATCUN antitumoral Utilizando estos par metros como entrada para el an lisis discriminante lineal hemos encontrado un modelo que discrimina entre las prote nas de divisi n ADN ATCUN y prote nas no activas con una precisi n del 91 32 379 de 415 de las prote nas que incluyen tanto el entrenamiento como la serie de validaci n externa
30. 6 79 7 93 6 0 946 11 Sh and Tr non embedded 79 6 95 5 92 9 77 5 93 0 0 927 9 Sh and X 81 2 95 7 93 3 78 5 93 4 0 952 24 Sh and X embedded 80 2 95 1 92 7 76 0 92 9 0 947 12 Sh and X non embedded 79 6 95 5 92 9 77 5 93 0 0 927 12 Tr and X 83 6 96 9 94 7 83 9 94 7 0 951 20 Tr and X embedded 83 6 96 8 94 6 83 6 94 7 0 958 11 Tr and X non embedded 80 2 95 5 93 0 77 4 93 1 0 935 9 All 84 96 7 94 6 82 9 94 6 0 954 42 All embedded 82 1 96 8 94 4 83 1 94 4 0 954 22 All non embedded 81 2 95 6 93 2 78 0 93 4 0 934 20 336 E Fern ndez Blanco et al Journal of Theoretical Biology 317 2013 331 337 Area under ROC 0 9543 Fig 3 ROC curve plot for the best classification method and the dataset containing the smallest number of attributes Again results show that Random Forest is able to achieve better classification scores and similar precision values considering less attributes as input in this case taking only into consideration those included in the Tr subset which contains only the values of the embedded graph By adding the embedded attributes of the X subset results are somehow better However this implies doubling the number of attributes used as input to the model Thus these results confirm that the rest of the attributes seem to add very little information or may even introduce noise inducing worse classifica tion scores If the ROC value is checked it can be observed that the same ROC values are obtained when using
31. Application of Bioinformatics for the search of novel anti viral therapies Rational design of anti herpes agents Current Bioinformatics 2011 6 1 81 93 Riera Fern ndez P Munteanu CR Pedreira Souto N Mart n Romalde R Duardo Sanchez A Gonz lez D az H Definition of Markov Harary Invariants and Review of Classic Topological Indices and Databases in Biology Parasitology Technology and Social Legal Networks Current Bioinformatics 2011 6 1 94 121 Dave K Banerjee A Bioinformatics analysis of functional relations between CNPs regions Current Bioinformatics 2011 6 1 122 8 Breiger R The Analysis of Social Networks In Handbook of Data Analysis Hardy M Bryman A eds Sage Publications London 2004 505 26 Abercrombie N Hill S Turner BS Social structure In The Penguin Dictionary of Sociology 4th ed Penguin London 2000 Craig C Social Structure In Dictionary of the Social Sciences Oxford University Press Oxford 2002 White H Scott Boorman and Ronald Breiger Social Structure from Multiple Networks I Blockmodels of Roles and Positions American Journal of Sociology 1976 81 730 80 Wellman B Berkowitz SD Social Structures A Network Approach Cambridge University Press Cambridge 1988 Newman MEJ The structure and function of complex networks SIAM Review 2003 45 167 256 Bornholdt S Schuster HG Handbook of Graphs and Complex Networks From the Genome to the Internet WILEY VCH GmbH amp CO K
32. Armas H Gonz lez D az R Molina M Perez Gonzalez and E Uriarte Bioorg Med Chem 2004 12 4815 4822 R Ramos de Armas H Gonz lez D az R Molina and E Uriarte Biopolymers 2005 77 247 256 A Speck Planche M T Scotti and V de Paulo Emerenciano Curr Pharm Des 2010 16 2656 2665 K C Chou and C T Zhang Crit Rev Biochem Mol Biol 1995 30 275 349 K C Chou and H B Shen Nat Sci 2010 2 1090 1103 openly accessible at http www scirp org journal NS K C Chou Z C Wu and X Xiao Mol BioSyst 2012 DOI 10 1039 C1MB05420A M Esmaeili H Mohabatkar and S Mohsenzadeh J Theor Biol 2010 263 203 209 D N Georgiou T E Karakasidis J J Nieto and A Torres J Theor Biol 2009 257 17 26 Q Gu Y S Ding and T L Zhang Protein Pept Lett 2010 17 559 567 H Mohabatkar Protein Pept Lett 2010 17 1207 1214 H Mohabatkar M Mohammad Beigi and A Esmaeili J Theor Biol 2011 281 18 23 L Yu Y Guo Y Li G Li M Li J Luo W Xiong and W Qin J Theor Biol 2010 267 1 6 J D Qiu J H Huang S P Shi and R P Liang Protein Pept Lett 2010 17 715 722 K C Chou Z C Wu and X Xiao PLoS One 2011 6 el8258 X Xiao P Wang and K C Chou Mol Diversity 2011 15 149 155 V A Ivanisenko S S Pintus D A Grigorovich and N A Kolchanov Nucleic Acids Res 2005 33 D183 D187 P D Dobson and A J Doig J
33. Combes RD Gonzalez MP Cordeiro MN Applications of 2D descriptors in drug design a DRAGON tale Curr Top Med Chem 2008 8 18 1628 55 Wang JF Wei DQ Chou KC Drug candidates from traditional chinese medicines Curr Top Med Chem 2008 8 18 1656 65 Duardo Sanchez A Patlewicz G Lopez Diaz A Current topics on software use in medicinal chemistry intellectual property taxes and regulatory issues Curr Top Med Chem 2008 8 18 1666 75 Gonzalez Diaz H Prado Prado F Ubeira FM Predicting antimicrobial drugs and targets with the MARCH INSIDE approach Curr Top Med Chem 2008 8 18 1676 90 Ivanciuc O Weka machine learning for predicting the phospholipidosis inducing potential Curr Top Med Chem 2008 8 18 1691 709 Chen J Shen B Computational Analysis of Amino Acid Mutation a Proteome Wide Perspective Curr Proteomics 2009 6 4 228 34 Chou KC Pseudo amino acid composition and its applications in bioinformatics proteomics and system biology Curr Proteomics 2009 6 4 262 74 Giuliani A Di Paola L Setola R Proteins as Networks A Mesoscopic Approach Using Haemoglobin Molecule as Case Study Curr Proteomics 2009 6 4 235 45 Gonz lez D az H Prado Prado F P rez Montoto LG Duardo S nchez A L pez D az A QSAR Models for Proteins of Parasitic Organisms Plants and Human Guests Theory Applications Legal Protection Taxes and Regulatory Issues Curr Proteomics 2009 6 4 214 27 Ivanciuc O Machine learning Quant
34. Entropies 28 Pares prote na f rmaco PROTEIN DRUG PAIRS Utilizando un archivo de entrada con la propiedad del par prote na f rmaco con PDBChain tab DrugName tab Activity o Si existe s lo un tipo de actividad casos positivos se pueden generar al azar pares de prote na f rmaco hasta X veces los casos positivos Se puede calcular este tipo de pares s lo si los dos tipos de c lculos para las prote nas y para los f rmacos est n activados Notas Los archivos de entrada y los de salida se pueden crear modificar directamente en la interfaz utilizando el NotePad nativo de Windows Los ndices Markov del MInD Prot Antes de calcular los ndices la matriz con las conectividades del grafo molecular estar normalizada tipo Markov los elementos de la matriz se dividen con el m ximo valor de su fila resultando una matriz con las probabilidades de los nodos P En un segundo paso P ser elevado al poder A 5 veces resultando k matrices Pk la entrada para el c lculo de los indices gt Spectral Moments PI gt Shannon Entropy Sh gt Mean Properties MP El MInD Prot calcula los ndices de modo similar al MARCH INSIDE pero sin conseguir el efecto del entorno molecular Las ventajas del MInD Prot son las siguientes Y para prote nas o c lculo de ndices promedios para cada tipo de clase de prote na de la entrada o utilizando los campos del heading de los archivos PDB o extraer toda la informaci n
35. Espa a y el Departamento de Microbiolog a y Parasitolog a de la Facultad de Farmacia Universidad de Santiago de Compostela USC Espana Bio AIMS esta dividido en dos tipos de herramientas 1 TargetPred Target Prediction Figure 27 Predicci n de dianas aplicaciones web para predecir la funci n de dianas diversas tales como las prote nas en enfermedades humanas o procesos moleculares utilizando informaci n a partir de las secuencias proteicas o la estructura 3D de las prote nas y a partir de la estructura qu mica de los f rmacos SMILES 2 DiseasePred Disease Prediction Predicci n de enfermedades aplicaciones en Biomedicina que ayudan en la predicci n de enfermedades humanas utilizando datos biol gicos tales como las mutaciones gen ticas tipo Single Nucleotide Polymorphism SNP registros de EEG o espectros de masas del proteoma de la sangre Los servidores presentados en esta tesis est n dentro de la secci n sobre el TargetPred Trypano PPI Plasmod PPI ATCUNpred y LectinPred Desde el 11 de febrero de 2010 hasta el 20 de abril de 2013 el servidor ha tenido m s de 5000 visitas nicas desde 101 pa ses Figura 28 Figura 28 Mas de 5000 visitas de las herramientas online del Bio AIMS desde el 11 de febrero 2010 hasta el 20 de abril de 2013 58 2 2 1 Trypano PPI Interacciones prote na prote na en Tripanosoma Trypano PPI A Web Server for Prediction of Unique Targets in Trypanosome Proteome
36. J Prado Prado C R Munteanu and H Gonz lez Diaz CULSPIN Compute ULam SPiral INdices Santiago de Compostela 2009 M Hall E Frank G Holmes B Pfahringer P Reutemann and I A Witten SIGKDD Explor 2009 11 10 18 T Sjoblom S Jones L D Wood D W Parsons J Lin T D Barber D Mandelker R J Leary J Ptak N Silliman S Szabo P Buckhaults C Farrell P Meeh S D Markowitz J Willis D Dawson J K Willson A F Gazdar J Hartigan L Wu C Liu G Parmigiani B H Park K E Bachman N Papadopoulos B Vogelstein K W Kinzler and V E Velculescu Science 2006 314 268 274 P D Dobson Y D Cai B J Stapley and A J Doig Curr Med Chem 2004 11 2135 2142 P D Dobson and A J Doig J Mol Biol 2005 345 187 199 K C Chou and H B Shen Anal Biochem 2007 370 1 16 K C Chou and H B Shen PLoS One 2010 5 e9931 N Rappin and R Dunn wxPython in Action Manning Publica tions Co Greenwich CT 2006 P Langley W Iba and K Thompson An analysis of Bayesian classifiers San Jose CA 1992 T Hastie R Tibshirani and J Friedman The Elements of Statistical Learning Data Mining Inference and Prediction Springer 2001 J Moody and C J Darken Neural Comput 1989 1 281 294 M Hall and E Frank presented in part at the In Proceedings of 21st Florida Artificial Intelligence Research Society Conference Miami Florida 2008 V Vapnik Stati
37. L Gonz lez D az H Uriarte E Proteins Markovian 3D QSAR with spherically truncated average electrostatic poten tials Bioorg Med Chem 2005 13 11 3641 7 Gonz lez D az H Molina R R Uriarte E Stochastic molecular descriptors for polymers 1 Modelling the properties of icosahedral viruses with 3D Markovian negentropies Polymer 2003 45 3845 53 Gonz lez D az H Molina R Uriarte E Markov entropy back bone electrostatic descriptors for predicting proteins biological activity Bioorg Med Chem Lett 2004 14 18 4691 5 Gonz lez D az H Sa z Urra L Molina R Uriarte E Stochastic molecular descriptors for polymers 2 Spherical truncation of Journal of Proteome Research e Vol 8 No 11 2009 5227 research articles 73 74 45 76 7T 78 79 80 81 82 83 84 85 electrostatic interactions on entropy based polymers 3D QSAR Polymer 2005 46 2791 8 Berman H M Westbrook J Feng Z Gilliland G Bhat T N Weissig H Shindyalov I N Bourne P E The Protein Data Bank Nucleic Acids Res 2000 28 235 42 Gonz lez D az H Molina R BIOMARKS version 1 0 contact information gonzalezdiazh yahoo es or qohumbe usc es2005 Kundu S Gupta Bhaya P How a repulsive charge distribution becomes attractive and stabilized by a polarizable protein dielec tric J Mol Struct Theochem 2004 668 Burykin A Warshel A On the origi
38. PPC biopolymer structure Therefore we introduced herein new Markov Chain numerical descriptors of protein protein Interactions PPIs based on electrostatic entropy measures and calculated these parameters for 5257 pairs of proteins 774 pPPCs and 4483 non pPPCs from more than 20 organisms including parasite and human hosts We found a simple Classification Tree with high Accuracy Sensitivity and Specificity 90 2 98 5 both in training and independent test sub sets and implemented this predictor in the user friendly web server Plas modPPI freely available at http miaja tic udc es Bio AIMS PlasmodPPI php 2009 Elsevier Ltd All rights reserved disease and death Despite this large burden of disease P vivax is overlooked and left in the shadow of the enormous problem caused Plasmodium falciparum P falciparum represents one of the strongest selective forces on the human genome This stable and perennial pressure has contributed to the progressive accumulation in the exposed populations of genetic adaptations to malaria Descriptive genetic epidemiology provides the initial step of a logical procedure of consequential phases spanning from the identification of genes involved in the resistance susceptibility to diseases to the determination of the underlying mechanisms and finally to the possible translation of the acquired knowledge in new control tools 1 In addition Plasmodium vivax P vivax is geographically the most wide
39. R Varona Santos J Uriarte E Gonzalez Diaz Y 2006 Novel 2D maps and coupling numbers for protein sequences The first QSAR study of polygalacturonases isolation and predic tion of a novel sequence from Psidium guajava L FEBS Lett 580 723 730 Aguiar Pulido V Munteanu C R Seoane J A Fernandez Blanco E P rez Montoto L G Gonzalez Diaz H Dorado J 2012 Naive Bayes QSDR classifi cation based on spiral graph Shannon entropies for protein biomarkers in human colon cancer Mol Biosyst 8 1716 1722 Aledo J C Li Y de Magalh es J P Ruiz Camacho M Perez Claros J A 2011 Mitochondrially encoded methionine is inversely related to longevity in mammals Aging Cell 10 198 207 Aledo J C Valverde H de Magalh es J P 2012 Mutational bias plays an important role in shaping longevity related amino acid content in Mammalian mtDNA encoded proteins J Mol Evol 74 332 341 Althaus I W Chou J J Gonzales A J Diebel M R Chou K C Kezdy F J Romero D L Aristoff P A Tarpley W G Reusser F 1993a Kinetic studies with the nonnucleoside HIV 1 reverse transcriptase inhibitor U 88204E Biochemistry 32 6548 6554 Althaus I W Chou J J Gonzales A J Diebel M R Chou K C Kezdy F J Romero D L Aristoff P A Tarpley W G Reusser F 1993b Steady state kinetic studies with the non nucleoside HIV 1 reverse transcriptase inhibitor U 87201E J Biol Chem 268 6119 6124
40. We found a total of 168 proteins of the human proteome with unknown function and low sequence homology After mining this dataset with LIBP Pred we have predicted 15 out of these 168 proteins as LIBPs with S LIBP gt 50 However only two proteins have a S LIBP gt 60 and we have not found any protein with a higher value The highest S LIBP values predicted for all human proteins studied with unknown function correspond to 2WM3 with S LBP 62 66 This is a statistically significant value but not very high value indeed of S LIBP Important clues that may support this prediction of 2WM3 by LIBP Pred as a LIBP is the binding of this protein to both phosphate and glycerol separately which are well known components of Mol BioSyst 2012 8 851 862 859 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A phospholipids In any case the protein header has an unknown function but also is bound to NADPH and is considered as an NmrA like family domain containing protein 1 in a public release to PDB This theoretical result points out 2WM3 as a potential candidate for future experiments in the search of cancer biomarkers For instance human HSCARG has been annotated as a possible cancer related protein and also contains an NmrA like domain Conclusions The discovery of new LIBPs is a goal of great importance and several authors have presented interesting
41. and Groups details txt A of 1 418875 7ODCA Total Execution Time 5 6s Transform your sequences in Star Network indices Figura 14 Interfaz de la S2SNet 34 Adem s la S2SNet tiene una ayuda Help detalles sobre el programa y los autores About la posibilidad de crear un archivo de texto nuevo New y la opci n de salir de la aplicaci n Quit Los botones se doblan con las opciones en los men s En la consola DOS se pueden ver siempre el estado de los c lculos y los errores C mo se utiliza la S2SNet En la ventana principal se pueden elegir los par metros del c lculo de los ndices topol gicos espec ficos los archivos de entrada salida y el tipo de visualizaci n de los grafos Y Par metros embedded se utiliza para crear redes grafos embebidos weight se utiliza para poner valores de peso en los nodos de los grafos la normalizaci n de tipo Markov para las matrices de conectividad si se quieren ver los detalles del c lculo se elige details el poder de las matrices de conectividad con el valor de power m x 5 el suporte para la creaci n y visualizaci n de los grafos con la opci n Network plots Y Archivos de entrada sequences secuencias groups grupos y weights pesos S Archivos de salidas resultados resu ts resultados y details detalles Y Display mode el tipo de visualizar la red sequence el nombre de la secuencia y el tipo del ejecutable
42. assay using i a panel of N terminal regions of TbPEXIA protein variants and ii a series of different peptides derived from TbPEXS5 each containing one of the three WXXXF Y motifs present in this receptor protein They concluded that the low sequence identities of PEX14 and PEX5 between parasite and its human host and the vital importance of proper glycosome biogenesis to the parasite render these peroxins highly promising drug targets These types of results indicate that unique PPIs of Trypa nosma parasites TPPIs and not presented in humans may be promising targets for the development of safe drugs with low toxicity In addition the high number of possible PPIs in parasite and human hosts makes difficult in terms of time and resources the exhaustive experimental investigation It deter mines that not only in parasites but in all organisms in general the development of predictive models for PPIs becomes a very useful tool to guide the discovery of new drug targets In general there are many structural parameters and theoretical methods that are useful in proteome research for protein structure function relationship studies In principle the same type of methods may be used for the prediction of PPIs in humans and other organisms Many of them use sequence alignment techniques phylogenic techniques or alignment free parameters to construct and or analyze proteins or PPIs in terms of protein networks representations as inpu
43. been known from the literature and ahs been the result of experi ments The best QSAR classification model that links the protein structural properties coded in spectral moments with the ATCUN activity is described by the following formula research articles DNA cleavage c y Cp TELLO 4 kl where DNA cleavage is the continue score value for the ATCUN non ATCUN classification z o are the 3D spectral moments with k from 1 to n the initial unperturbed spectral moments for k 0 the short range spectral moment for k 1 the middle range spectral moments for k 2 and the long range spectral moments for k 3 for the amino acid orbits O c core i inner m middle and o outer c c are the spectral moment coef ficients nis the number for the indices and cy is the independent term GDA models quality was determined by examining Wilk s statistics leverage threshold to define the model domain h the model significance level p level and canonical regression coef ficient Ro We also inspected the percentage of good classifica tion cases variables ratios and number of variables to be explored to avoid overfitting or chance correlation The LDA Forward stepwise method was used to find the best model Thus the training set of proteins were used to create the model and the validation set to verify if the model can accurately predict the ATCUN activity for new proteins Figure 1 Other metho
44. by using Electrostatic Parameters of Protein Protein Interactions Journal of Proteome Research 9 2 1182 1190 2010 Yamilet Rodriguez Soca Cristian R Munteanu Juli n Dorado Alejandro Pazos Francisco J Prado Prado and Humberto Gonz lez D az Enlace http goo gl nCgR9 Herramienta http bio aims udc es TrypanoPPI php Ibero NBIC Network pass a TrypanoPPI Q Bio E Faculty o AIMS Home Links About Modelling the reality PDB chain lists Please paste the names of the PDB ia di chains as two lists maximum 50 e Notes There is no space between the PDB name and gt F J the chain label no emptry new line the results will print y the combination between the chains from the first list and the chains from the second one LNN 2 2 1 1 1HOZA 1HOZB progeen PETERE V a 1K3TB 1F2CA Trypano PPI Test Accuracy 90 9 Trypanosome Training Accuracy Protein Protein 89 5 Interactions TPPI Tool MARCH INSIDE Python version Data RCSB PDB Predict Figura 29 Herramienta online TrypanoPPI Tripanosoma brucei causa la tripanosomiasis africana en los seres humanos HAT o enfermedad del suefio africano y Nagana en el ganado La enfermedad amenaza a m s de 60 59 millones de personas y la innumerable cantidad de ganado en 36 pa ses de frica subsahariana teniendo un impacto devastador en la salud humana y en la econom a Por otro lado el Trypanosoma cruzi es el responsabl
45. database uploads the PDB files with the 3D structure of the protein constructs the Markov matrix of electrostatic interactions and calculates the total and region R average electrostatic potential values T R for each query protein LIBP Pred mode 2 In mode 1 LIBP Pred may be used to select potential LIBPs between proteins with known 3D struc tures that have been released from PDB but with unknown function However there are other potential uses of this server How should one predict S LIBP values for proteins with known sequence but unknown 3D structure and function that have not been released to PDB Mode 2 is essentially the same as mode 1 but the server prompts the users to upload ent and pdb files with 3D structures of proteins generated by using LOMETS web server developed by Prof Zhang et al at Michigan University In Fig 3 we depict the user interface for LIBP Pred mode 2 bottom of the web page LOMETS is a local threading meta server for quick and automated predic tions of protein tertiary structures and spatial constraints Nine state of the art threading programs are installed and run in a local computer cluster which ensure the quick generation of initial threading alignments compared to traditional remote server based meta servers Consensus models are generated from the top predictions of the component threading servers which are at least 7 more accurate than the best individual servers based on a TM score at a
46. de Graphviz dot circo twopi neato y fdp se calculan autom ticamente los grafos m ximos y promedios de todas las secuencias analizadas Ejemplo de c lculo con la S2SNet Un ejemplo de c lculo es utilizar una secuencia proteica 7ODCA de la base de datos de prote nas Protein Data Bank http www rcsb org Las entradas con la secuencia de amino cidos y los grupos se presentan en la Figura 15 35 groups txt Bloc de notas Lo ae Archivo Edici n Formato Ver Ayuda 3 seqs txt Bloc de notas Ga Archivo Edici n Formato Ver Ayuda 70DC A SSFTKDEFDCHILDEGFTAKDILDQKINDKDAFYVADLGDILKKHLRW LKALPRVTPFYAVKCNDSRAIVSTLAAIGTGFDCASKTEIQLVQGLGV PAERVIYANPCKQVSQIKYAASNGVQMMTFDSEIELMKVARAHPKAKL VLRIATKFGATLKTSRLLLERAKELNIDVIGVSFHVGSGCTDPDTFVQ AVSDARCVFDMATEVGF SMHLLDIGGGFPGSEDTKLKFEEITSVINPA LDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKTVWEQTFMYYVN DGVYGSFNCILYDHAHVKALLQKRPKPDEKYYSSSIWGPTCDGLDRIV TM PEMHVGDWMLFENMGAYTVAAASTFNGFQRPNIYYVM SR PMWQ LM XXc umouzzrxznuroammornry Figura 15 Ejemplo de entrada en la S2SNet secuencia y grupos de aminoacidos La S2SNet transforma la secuencia en una lista de ndices topol gicos espec ficos para el grafo de tipo estrella y tambi n puede generar las im genes de los grafos con la ayuda del Graphviz En la Figura 16 se presentan los resultados para los c lculos de grafos non embedded grafo situado a la izquierda con neato grafo situado a l
47. details behind this kind of models including the vast literature published by Chou et al on the development of models with pseudo amino acid composition parameters or the use of machine learning classification techniques and other algorithms In any case to the best of our knowledge in the literature there is no other theoretical method to predict LIBPs in parasites cancer tissue or other disease specific proteomes that are not present in humans or other organisms based on the 3D structure of proteins According to a recent comprehensive review to establish a really useful statistical predictor for a protein system we need to consider the following procedures 1 construct or select a valid benchmark dataset to train and test the predictor ii formulate the protein samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the attribute to be predicted iii introduce or develop a powerful algorithm or engine to operate the prediction iv properly perform cross validation tests to objectively evaluate the anticipated accuracy of the predictor v establish a user friendly web server for the predictor that is accessible to the public Let us describe as follows how to deal with these steps one by one Gonz lez D az et al introduced the method called MARkovian CHemicals IN Silico DEsign MARCH INSIDE 1 0 for the computational design of small sized drugs The approach uses a Marko
48. en ese gnomon Esta opci n puede ser muy til si se trabaja con secuencias de tama o moderado y con un gran n mero de clases P gina Indices Esta p gina se adiciona a la libreta y se muestra al usuario inmediatamente despu s de que se calculen los 77s a las secuencias seleccionadas Figura 22 El formato de la p gina es el de una tabla tipo hoja de c lculo en la que en el encabezado de las columnas se muestran los nombres de los ndices y el de las filas el de las secuencias o casos En esta tabla se puede seleccionar una celda un rango una columna una fila o todas las celdas y copiar el contenido de la selecci n en el clipboard mediante la combinaci n Ctrl C para luego pegarlo en donde se desee Esta posibilidad es muy til si se desea exportar de modo 51 r pido sencillo y f cil los valores de los 77s calculados en aplicaciones externas tales como Excel File Submit View Help Options Indices 0 06579 0 06122 0 06333 0 06333 0 06000 0 06000 0 06000 0 06081 0 06081 0 06081 0 06081 0 06081 0 06081 0 06081 0 06081 Figura 22 Resultado con los TIs calculados en el CULSPIN Espiral de Ulam 0 07895 0 10526 0 10855 0 11224 0 11000 0 11000 0 11000 0 11000 0 11000 0 11149 0 11149 0 11149 0 11149 0 11149 0 11149 0 11149 0 11149 0 10544 0 10333 0 10333 0 10333 0 10333 0 10333 0 10473 0 10473 0 10473 0 10473 0 10473 0 10473 0 10473 0 10473 0 11513 0 11905 0 11667 0 11667
49. es una utilidad que calcula ndices del Estado Electrotopol gico E valores de las mol culas incluyendo el estado electrotopol gico E Estado y el E Estado de hidr geno HE Estado los valores de los tomos individuales as como los ndices del tomo Figura 7 Estos c lculos ayudan a entender el desarrollo uso e interpretaci n de los valores 9 del E Estado como una representaci n de la estructura molecular Las partes de c mputo de este programa se han tomado de Molconn Z y de SciQSAR 2D 67 Figura 7 Interfaz del E Calc 1 1 6 CODESSA PRO CODESSA PRO Comprehensive Descriptors for Structural and Statistical Analysis an lisis estructural y estad stica para descriptores http www codessa pro com es un programa disefiado por Alan R Katritzky Karelson Mati y Petrukhin Ruslan y desarrollado del a o 2001 al 2005 Figura 8 El manual del usuario http www codessa pro com manuals manual htm especifica que est disefiado para el desarrollo de las relaciones cuantitativas tipo QSAR QSPR mediante la integraci n de todas las medidas matem ticas y herramientas computacionales necesarias para 1 calcular una gran variedad de descriptores moleculares utilizando la estructura geom trica 3D y o la funci n de onda mecanocu ntica de los compuestos qu micos ii el desarrollo de varios modelos QSPR lineales y no lineales para propiedades qu micas y f sicas o para l
50. estimation of pKa by least squares nonlinear regression analysis of multiwavelength spectrophotometric pH titration data Anal Bioanal Chem 2007 387 3 941 55 Melino S Garlando L Patamia M Paci M Petruzzelli R A metal binding site is present in the amino terminal region of the bioactive iron regulator hepcidin 25 J Pept Res 2005 66 s1 65 71 Macrae I J Zhou K Li F Repic A Brooks A N Cande W Z Adams P D Doudna J A Structural basis for double stranded RNA processing by Dicer Science 2006 311 5758 195 8 Pathuri P Nguyen E T Svard S G Luecke H Apo and calcium bound crystal structures of Alpha 11 giardin an unusual annexin from Giardia lamblia J Mol Biol 2007 368 2 493 508 Pathuri P Nguyen E T Ozorowski G Svard S G Luecke H Apo and calcium bound crystal structures of cytoskeletal protein alpha 14 giardin annexin El from the intestinal protozoan parasite Giardia lamblia J Mol Biol 2009 385 4 1098 112 Wingard J N Ladner J Vanarotti M Fisher A J Robinson H Buchanan K T Engman D M Ames J B Structural insights into membrane targeting by the flagellar calcium binding protein FCaBP a myristoylated and palmitoylated calcium sensor in Trypanosoma cruzi J Biol Chem 2008 283 34 23388 96 Altschul S F Gish W Miller W Myers E W Lipman D J Basic local alignment search tool J Mol Biol 1990 215 3
51. et al 2000 the Antioxidant activity list obtained with the Molecular Function Browser in the Advanced Search Interface The negative group was constructed using the PISCES CulledPDB Wang and Dunbrack 2003 list of proteins with identity less than 20 resolution of 1 6 and R factor 0 25 non antioxidant proteins included but any other possible biolo gical function Identity is the degree of correspondence between two sequences and a value of 252 or higher implies similarity of function The sequence identities for PDB sequences have been determined using Combinatorial Extension CE structural alignment Shindyalov and Bourne 1998 The PIECES server http dunbrack fccc edu PISCES php used a Z score of 3 5 as the threshold to accept possible evolutionary relationships PISCES alignments are local so that two proteins that share a common domain with sequence identity above the threshold are not both included in the output lists Both lists have not been post filtered for any source organism 2 2 Star Graph topological indices Each protein was transformed into a Star Graph where the amino acids are the vertices nodes connected in a specific sequence by the peptide bonds The Star Graph is a special type of tree with N vertices where one has got N 1 degrees of freedom and the remaining N 1 vertices have got one single degree of freedom Harary 1969 Each of the 20 possible branches rays E Fern ndez Blanco e
52. extract more general conclusions from this study the authors have tested the different classification techni ques using 10 fold cross validation McLachlan et al 2004 Table 1 Performance of the classification methods considering all the attributes Global ROC precision 25 Non Precision Global antiox Technique Antiox antiox Naive 975 49 1 570 274 87 4 0 78 bayes MLP 228 97 5 854 63 8 83 0 0 874 K star 867 943 93 31 74 7 93 7 0 971 JRip 648 96 1 91 0 764 90 6 0 814 Random 81 8 95 0 928 759 93 1 0 884 tree Random 84 96 7 946 82 9 94 6 0 954 Forest 10 fold cross validation is the most common among the k fold cross validation family and its objective is to minimize the influence of the randomness in creating the training and test sets for a specific classification technique The objective of this work is to select the technique with the highest classification score having a good precision value due to the nature of the problem The first approach considered was to use linear regression but the results showed that it was impos sible to achieve good classification scores with this technique Table 1 shows the results of the different classification models tested those that obtained the best scores considering all the attributes extracted from the Star Graph that is 42 attributes The algorithms used in the tests are those implemented in the Weka Machine Learning framework This table shows for eac
53. fimbrium 4 flagelo 5 membrana interna 6 nucle do 7 membrana externa y 8 periplasma Tambi n se puede utilizar para el caso en que una prote na de una busqueda puede existir al mismo tiempo en m s de un lugar En comparaci n con el pron stico original llamado Gneg Ploc el nuevo modelo es mucho m s potente y flexible Para un conjunto de datos de referencia en los que ninguna de las prote nas ha incluido una identidad de secuencia m s del 25 en comparaci n con otras de la misma ubicaci n la clasificaci n Gneg mPLoc fue del 85 596 que era m s de un 14 superior a la tasa correspondiente al Gneg Ploc Como servidor gratuito Gneg mPLoc se encuentra en http www csbio sjtu edu cn bioinf Gneg multi 21 O Gneg mPLoc server e Q fi O www csbio situ edu cn bioinf Gneg multi ZA Gneg mPLoc Predicting subcellular localization of Gram negative bacterial proteins Read Me Data Citation Download Input the Gram negative protein sequence in Fasta format Example Submit Reference Hong Bin Shen and Kuo Chen Chou Gneg mPLoc A top down strategy to enhance the quality of predicting subcellular localization of Gram negative bacterial proteins Journal of Theoretical Biology 2010 264 326 333 Kuo Chen Chou and Hong Bin Shen Cell PLoc A package of web servers for predicting subcellular localization of proteins in various organisms Nature Protocols 2008 3 153 162 Kuo Chen Chou and Hong B
54. in its web http www cancer gov colorectalcancerrisk a colorectal cancer risk assessment tool an interactive tool to help estimate Department of Information and Communications Technologies University of A Coru a Campus Elvi a 15071 A Coru a Spain E mail jseoane udc es Fax 34 981167160 Tel 34 981167000 ext 1302 Department of Microbiology amp Parasitology Faculty of Pharmacy University of Santiago de Compostela 15782 Santiago de Compostela Spain Fax 34 981594912 Tel 34 981563100 1716 Mol BioSyst 2012 8 1716 1722 a person s risk of developing colorectal cancer The tool is based on the work published in Journal of Clinical Oncology and it can estimate the risk for men and women who are between the ages of 50 and 85 African American Asian American Pacific Islander Hispanic Latino or White but it cannot accurately estimate the risk for people who have problems such as ulcerative colitis Crohn s disease familial adenomatous polyposis FAP hereditary nonpolyposis colorectal cancer HNPCC or personal history of colorectal cancer Therefore the development of simple and fast theoretical methods for searching HCC biomarkers before the adenoma or in the initial stages of the disease becomes very important In this paper the Quantitative Structure Disease Relationship QSDR will be used which is similar to Quantitative Structure Activity Relationship QSAR QSDR is one of the widely u
55. interaction is banished aj 0 The relationship aj may be dis played as a protein structure complex network In this network the nodes are the C atoms of the amino acids and the edges connect Y Rodriguez Soca et al Polymer 51 2010 264 273 267 PLASMODIUM PROTEIN 1 PDB MARCH INSIDE 3D ELECTROSTATIC ENTROPY 9 R Classification Trees Protein Protein Interactions in Plasmodium PPPI Classification Model PlasmodPPPI Web Server based on 0 m 0 s 0s t and 6 t Protein Protein Interaction Electrostatic Absolute Difference Invariant 46 R OR 6 R R orbits core c inner i middle m or surface region s PROTEIN 2 PDB MARCH INSIDE 3D ELECTROSTATIC ENTROPY 9 CR Validation Set Plasmodium proteins Prediction of de Protein Protein Interactions for new proteins in Plasmodium Fig 2 Example of spatial distribution of core inner middle and surface amino acids pairs of amino acids with ojj 1 Euclidean 3D space r3 x y Z coordinates of the C atoms of amino acids listed on protein PDB files For the calculation all water molecules and metal ions were removed 67 All calculations were carried out with our in house software MARCH INSIDE 2 0 71 For the calculation the MARCH INSIDE software always uses the full matrix never a sub matrix but may run the last summation term either for all amino acids or only for some specific groups
56. into public servers preferably of free access available online to the scientific community The server packages developed by Chou and Shen that predict the func tion of proteins from structural parameters or explore protein structures are good examples in this sense These may be used by proteome research scientists by interacting with user friendly interfaces It means that the user does not need to be an expert on the theoretical details behind this kind of models including the vast literature published by Chou et al on the development of models with pseudo amino acid composition parameters or the use of ML classification techniques and other algorithms However to the best of our knowledge there is no QSAR based server for the prediction of LIBPs In this sense we have implemented the best LDA model found here at the web portal Bio AIMS as an online server called LIBP Pred The acronym LIBP Pred comes from LIpid Bind ing Proteins Predictor LIBP Pred is located at http miaja tic udc es Bio AIMS LIBP Pred php This online tool is based on PHP HTML and Python routines coupled to nested MARCH INSIDE classic algorithm to calculate input mole cular structure parameters LIBP Pred mode 1 In Fig 3 we depict the user interface for LIBP Pred including mode 1 top of the web page The user only has to paste the PDB ID of the query proteins with unknown functions With these PDB ID codes LIBP Pred automatically connects to the PDB
57. journal is The Royal Society of Chemistry 2012 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A S LIBP is the above mentioned output of the model It is a real valued variable that scores the propensity of a protein to act as a LIBP The y and p level values were examined in order to test the statistical significance of the model The Accuracy Specificity Sensitivity were used to quantify the goodness of fit and the discriminatory power of the model Different authors have applied this type of LDA model using different classes of input variables to construct QSAR models for drugs 56 proteins or nucleic acids 8 52 57 9 In statistical prediction the following three cross validation methods are often used to examine a predictor for its effective ness in practical application independent dataset test sub sampling test and jackknife test However out of the three test methods the jackknife test is deemed the most objective The reasons are as follows 1 for the independent dataset test although all the proteins used to test the predictor are outside the training dataset used to train it so as to exclude the memory effect or bias the way of how to select the independent proteins to test the predictor could be quite arbitrary unless the number of independent proteins is sufficiently large This kind of arbitrariness might result in compl
58. la secuencia cuya letra representa la clase a la que pertenece dicho elemento y en cada gnomon existir n una o m s clases diferentes Figura 25 Figura 25 Representaci n de una secuencia de letras en gn mones ndices definici n y c lculo Como se ha comentado desde un inicio en los Grafos U construidos con ayuda de CULSPIN cada nodo pertenece a una clase determinada y ellos no s lo est n conectados siguiendo la secuencia de letras sino que adem s aquellos nodos que pertenecen a la misma clase tienen igual letra se conectan entre s De modo que en nuestros Grafos U cada nodo estar conectado con uno o m s nodos Por definici n se conoce como grados de un nodo al n mero de nodos con los que est conectado el nodo en cuesti n y por grados totales de un grafo a la suma de los grados de todos los nodos que conforman el grafo entonces podemos definir como grados de un gnomon a la suma de los grados de los nodos que pertenecen a dicho gnomon Teniendo en cuenta todo lo anterior los ndices calculados por CULSPIN se definen y calculan en las formulas desde la Figura 26 54 Figura 26 Las formulas para los c lculos de gn mones 23 REGISTRO GENERAL DE LA PROPIEDAD INTELECTUAL Seg n lo dispuesto en la Ley de Propiedad Intelectual Real Decreto Legislativo 1 1996 de 12 de abril quedan inscritos en este Registro los derechos de propiedad intelectual en la forma que se determina seguidamente N MER
59. los resultados obtenidos en forma de art culos de revisi n y cap tulos de libro manuales para las herramientas inform ticas desarrolladas programas de ordenador y art culos de investigaci n ya publicados por el autor Los tres programas de ordenador desarrollados y o registrados fueron MInD Prot S2SNet y CULSPIN Se presenta un total de 6 publicaciones cient ficas art culos de revista con ndice de impacto JCR agrupadas seg n el objetivo espec fico que cumplimentan Los cuatro servidores web herramientas de uso online desarrollados fueron Trypano PPI Plasmod PPI ATCUNpred y LIBPpred Los servidores finales fueron utilizados con el fin de apoyar los datos experimentales para m s de nueve tipos de par sitos como son Tripanosoma Plasmodium Trypanosoma Leishmania Toxoplasma Shigella y Cryptosporidium A cada servidor web le corresponde un art culo publicado en el que se describe el desarrollo la validaci n y la aplicaci n de la herramienta En otros art culos se describen las metodolog as y o los algoritmos que fueron necesarios desarrollar previamente para la creaci n de los servidores presentados Para cada art culo se presenta una breve secci n explicativa en espafiol de su importancia y los resultados alcanzados En el apartado 5 PUBLICACIONES ANEXOS de esta Tesis se adjuntan las publicaciones correspondientes en el idioma en el que fueron publicadas 25 2 1 Nuevos programas de ordenador para los par me
60. most methods that predict protein functions are reliant on identifying a similar protein and transferring its annotations to the query protein An example is the BLAST method that fails when a similar protein cannot be identified or when any similar proteins identified also lack reliable annotations At the moment there is no template of ATCUN protein in the BLAST server and therefore the BLAST method fails to predict the ATCUN DNA cleavage activity of proteins As an advantage the current method can predict the ATCUN function of a protein even if it has other known activity Conclusions The study of the metal protein functions and interactions is a topic of great importance and several authors have presented interesting results The present work proposes a new QSAR model based on the electrostatic spectral moment indices and evaluates the presence of the potential ATCUN like antitumor activity of the proteins All of the calculations have been made using the 3D structure information contained in PDB files for metal unbound or free proteins and the resulting model is simpler compared with a similar model based on the electrostatic potentials Thus the present QSAR approach is very useful in bioinorganic chemistry for the prediction of the biological activity of potential metal protein complexes whose free protein structure has been characterized but the metal interactions remain unexplored The desirability analysis of the model
61. nas 26 2 1 2 S2SNet ndices topol gicos del grafo tipo estrella 31 2 1 3 CULSPIN ndices topol gicos del grafo tipo espiral 42 2 2 Nuevos servidores online Bio AIMS basados en t cnicas de ingenier a inform tica e inteligencia arcilla A AAA 57 2 2 1 Trypano PPI Interacciones prote na prote na en Tripanosoma 59 2 2 2 Plasmod PPI Interacciones prote na prote na en Plasmodium 62 2 2 3 ATCUNpred Prediccion de dianas proteicas con actividad ATCUN en POISE Savi Heelies poe as aha ia dada 64 2 2 4 LIBPpred Predicci n de prote nas que interacciona con los lipidos 66 3 CONCLUSIONES nro TEARS AA 69 4 REFERENCIAS uo 70 5 PUBLICACIONES ANEXOS Publicaciones con S2SNet Enrique Fernandez Blanco Vanessa Aguiar Pulido Cristian R Munteanu Julian Dorado Random Forest Classification based on Star Graph Topological Indices for Antioxidant Proteins Journal of Theoretical Biology 317 331 337 2013 http goo gl R5vV8 Publicaciones con grafos de tipo espiral Vanessa Aguiar Pulido Cristian Robert Munteanu Jos A Seoane Enrique Fern ndez Blanco L zaro G P rez Montoto Humberto Gonz lez D az Julian Dorado Naive Bayes QSDR classification based on spiral graph Shannon entropies for protein biomarkers in human colon cancer Molecular BioSystems 8 1716 1722 2012 http qoo gl JQQIE Publicaciones para l
62. non antioxidant proteins in FASTA format By using the S2SNet tool Munteanu et al 2009 the sequences of amino acids are transformed into Star Graphs and the corresponding topological indices are calculated The resulting numbers that characterised each graph that is a protein graphical representation are then used in Weka Hall et al 2009a to find the best QSAR classifica tion model The final model is used to predict antioxidant activity for new amino acid sequences Antioxidant Non antioxidant Proteins database of FASTA primary structures GQRWELALGRFW DYLRWVQTLSEQVQEELLSSQVT QELRALMDETMKELKAYKSELEEQLTPVAEETRARLS KELQAAQARLGADMEDVCGRLVQYRGEVQAMLGQ STEELRVRLASHLRKLRKRLLRDADDLOKRLAVYOAG S2SNet Sequence to Star Network Calculation of embedded and non embedded star graph topological indices Weka Machine Learning software Searching the best QSAR classification model between the protein structure and the biological activity Y Predict antioxidant activity for NEW amino acid sequences Fig 1 Flowchart of building QSAR classification models for protein antioxidant activity prediction 2 1 Protein set This work is based on datasets extracted from several protein databases The sets of protein primary sequences are represented by 324 proteins with antioxidant activity and 1675 proteins without The antioxidant protein FASTA sequences positive group have been downloaded from the Protein Databank Berman
63. o4 Pu Pw Pw Pva Pw Pwll P 8 0 pwr Pw 0 pw pww Pw In order to carry out the calculations referred to in eqs 1 for any kind of potential and detailed in the previous equations for electrostatic potential the elements p of M and the absolute initial probabilities p j were calculated as follows didi ij 1 Y5 di Pij 41 41 9 E 29 di Im m 1 TN m 1 T di 4i d py 10 dom where q and q are the AMBER electronic charge parameters for amino acids i ga and the j aa and the neighborhood Trypano PPI Figure 1 Example of spatial distribution of core inner middle and surface amino acids relationship truncation function a 1 was turned on if these amino acids participate in a peptidic hydrogen bond or dj deutort 1 2007 wr which is the semisum of the van der Waals radii for both aa In this regard truncation of the molecular field is usually applied to simplify all the calculations in large biological systems The distance dj is the Euclidean distance between the C atoms of the two amino acids and doj the distance between the amino acid and the center of charge of the protein Both kinds of distances were derived from the x y and z coordinates of the amino acids collected from the protein PDB files All calculations were carried out with our in house software MARCH INSIDE For calculation all water molecules and metal ions were removed
64. of the LIBPs In this sense structural parameters that numerically describe both the global and local 3D structure of proteins may be useful for the study of LIBPs Previous work has reported the applicability of the LDA in QSAR studies The best QSAR LDA model in this study is described by eqn 5 and was obtained with the Forward stepwise method from STATISTICA S LIBP 12 851z c 18 355ma c 27 331m5 c 6 870n3 i 5 761n4 i 1 510m s 1 074m 1 0 292n3 t 2 030r4 1 5 4601 N 1351 Re 0 78 x 1259 574 p lt 0 001 5 Interestingly only the spectral moments of the electrostatic field are linearly correlated to LIBP nLIBP discrimination As mentioned in the Materials and methods section we have explored three types of input variables to seek this equation EXR TAR and 0 R values z indicates spectral moments of the electrostatic field average electrostatic potentials and 0 entropy values of the electrostatic field This indicates that Mol BioSyst 2012 8 851 862 855 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A self return propagation of electrostatic interactions within a protein 3D backbone control LIBP action instead of the magnitude of the electrostatic potential per se potential control or the total information about electrostatic inter actions 0 entropy control On the ot
65. placing the natural numbers following the shape of a spiral Then this disposition became highly popularized as a visual picture in a number of Scientific American magazines in 1964 To construct the spiral one must write down a regular grid of numbers starting with one at the centre and spiralling out the rest of integer numbers just as shown in Fig 2A In mathematics this is a simple method of graphing numbers that reveals hidden patterns in numeric series and sequences In molecular sciences this spiral representation was associated to a graph in order to represent DNA nucleotide sequences in a letter sequence of four classes A T G and C A 101 100 99 98 97 96 95 94 93 92 91 I 102 6 64 63 62 61 60 59 58 57 90 I I 1 1 103 66 91 Se SS MA St 53 usi ani SE 56 89 l I I 1 104 67 38 7 16 15 4 13 30 E ss I I I 1 D I l 105 68 39 18 43 nu 9 E 87 I I l 1 1 1 I 1 16 69 40 19 6 1 2 n 28 s 86 l I l I 1 I 1 w m a 20 Ti d et 10 zd uc 27 2 ss I 1 l 1 n s1 ss 1 1 1 n s s J aki A A K K K K K A 22 23 14 1S 45 46 47 48 4 IS 76 77 78 79 xK K xK K K x K K A K K 11 112 113 114 US 116 117 Fig 2 Spiral of a regular grid of numbers A the number gnomons division B and the letter gnomons division C Mol BioSyst 2012 8 1716 1722 1717 The Ulam spiral can be divided in
66. predictive ability 90 92 for new proteins linked with this type of cancer The statistical analysis confirms that this model allows diagnosing the absence of human colon cancer obtaining an area under receiver operating characteristic of 0 91 The methodology presented can be used for any type of sequential information such as any protein and nucleic acid sequence Introduction Cancer is one of the leading causes of death worldwide and human colon cancer HCC has an important social impact HCC represents the uncontrolled growth of abnormal cells in the colon part of the intestine due to DNA transformation mutation Therefore these cells invade and destroy normal tissues around or even distant organs by spreading through the blood lymphatic system The initial stage of this disease is represented by adenomatous polyps in the colon that may develop into cancer over time The most frequent diagnosis method is the colonoscopy and the therapy consists of surgery followed by chemotherapy If the cancer is detected early it can be frequently cured Even if in the last few years the rate of mortality caused by this type of cancer has decreased due to better personalized treatments and new detection methods HCC is still very common in men and women all over the world This disease has complex causes that include age diet smoking genetic background DNA mutations and external factors The National Cancer Institute NCI in U S implemented
67. predicts the values for the spectral moments in one single region for the ATCUN like proteins The evaluation of the DNA cleavage activity for the parasite protein chains by using the present web implemented model was preceded and became a starting point for future experimental and theoretical studies of parasite pathologies Acknowledgment C R M and H G D from the Faculty of Computer Science University of A Coru a and the Faculty of Pharmacy University of Santiago de Compostela Spain respectively acknowledge financial support granted by Isidro Parga Pondal program of Xunta de Galicia We also thank the General Directorate of Scientific and Technologic Promotion of the Galician University System of the Xunta de Galicia for the grants 2007 127 and 2007 144 5226 Journal of Proteome Research e Vol 8 No 11 2009 Munteanu et al References 1 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Laussac J P Sarkar B Characterization of the copper II and nickel ID transport site of human serum albumin Studies of copper II and nickel II binding to peptide 1 24 of human serum albumin by PC and H NMR spectroscopy Biochemistry 1984 23 12 2832 8 Kimoto E Tanaka H Gyotoku J Morishige F Pauling L Enhancement of antitumor activity of ascorbate against Ehrlich ascites tumor cells by the copper glycylglycylhistidine co
68. promote rapid vaccine development and other tools for the control of endemic diseases The LIBP 14 kDa antigen of S mansoni Sm14 stands out both due to its steady progress towards field trials 852 Mol BioSyst 2012 8 851 862 View Online and because it represents the sole vaccine candidate to emerge from an endemic country Studies have now progressed to the scale up level and an industrial production process has successfully been put in place It has been demonstrated that it is effective not only against S mansoni in humans but also against F hepatica a parasite that causes disease in cattle and sheep leading to annual losses over 3 US billion to the food industry worldwide The Sm14 patents have been granted to Oswaldo Cruz Foundation FIOCRUZ a Brazilian scientific institution directly linked to the Brazilian Ministry of Health In fact free living nematodes such as Caenorhabditis elegans also secrete a structurally novel class of proteins FARs that present both FAB and retinol binding activity into the surrounding tissues of the host One important class of FARs is the nematode polyprotein allergens antigens NPAs these proteins are of interest because they may play an important role in scavenging fatty acids and retinoids from the host that are essential for the survival of the parasite and also because the localised depletion of such lipids may have immunomodulatory effects that compromise the host immune respons
69. regarding the discovery of new Cancer Biomarkers in humans or drug targets in parasites have been discussed here in this sense Introduction Fatty Acid Binding Proteins FABPs or generally speaking Lipid Binding proteins LIBPs play important roles in many diseases The mammalian FABPs bind long chain FA with high affinity The recent discussion carried out by Storch and McDermott highlights that the large number of FABP types is suggestive of distinct functions in specific tissues Thus the Department of Microbiology amp Parasitology Faculty of Pharmacy University of Santiago de Compostela Praza Seminario de Estudos Galegos s n Campus Sur 15782 Santiago de Compostela Spain E mail gonzalezdiazh a yahoo es P Department of Information and Communication Technologies Computer Science Faculty University of A Coru a 15071 A Corufia Spain S C POLIPHARMA INDUSTRIES S R L 550052 Sibiu Romania 4 Department of Organic Chemistry University of Santiago de Compostela 15782 Santiago de Compostela Spain f Electronic supplementary information ESI available See DOI 10 1039 c2mb05432a This journal is The Royal Society of Chemistry 2012 LIBPs modulate intracellular lipid homeostasis by regulating FA transport in the nuclear and extra nuclear compartments of the cell in doing so they also impact systemic energy homeo stasis In this sense the characterization of LIBPs has become important for vaccine de
70. results The present work has demonstrated that there is a strong linear relation ship between electrostatic spectral moments calculated with a MARCH INSIDE approach and the action of LIBPs Con sequently using these parameters we can seek a linear QSAR useful to predict LIBPs The online implementation of this model in the web server LIBP Pred allows public researchers around the world to predict online new LIBPs free of cost LIBP Pred may be used to mine the PDB or to upload and predict custom 3D models of proteins with unknown structure generated with well known servers as in the case of LOMETS We have demonstrated the PDB mining option performing a predictive study of 2000 proteins with unknown function looking for new Cancer Biomarkers in humans or drug targets in parasites Since user friendly and publicly accessible web servers represent the future direction of developing practically more useful predictors we have provided herein a web server for the method presented in this paper at http miaja tic udc es Bio AIMS LIBPpred php Acknowledgements Munteanu CR and Gonz lez D az H acknowledge the research programme Isidro Parga Pondal funded by Xunta de Galicia and the European Social Funds ESF for partial financial support F Prado Prado acknowledges the research programme Angeles Albarifio funded by the same institutions for partial financial support References 1 J Storch and L McDermott J Lipid Res 2009 50 S
71. state kinetic studies with the non nucleoside HIV 1 reverse transcriptase inhibitor U 87201E J Biol Chem 1993 268 6119 6124 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 research articles Althaus I W Chou J J Gonzales A J Diebel M R Chou K C Kezdy F J Romero D L Aristoff P A Tarpley W G Reusser F Kinetic studies with the nonnucleoside HIV 1 reverse tran scriptase inhibitor U 88204E Biochemistry 1993 32 6548 6554 Althaus I W Chou J J Gonzales A J LeMay R J Deibel M R Chou K C Kezdy F J Romero D L Thomas R C Aristoff P A Steady state kinetic studies with the polysulfonate U 9843 an HIV reverse transcriptase inhibitor Experientia 1994 50 1 23 8 Althaus I W Chou K C Lemay R J Franks K M Deibel M R Kezdy F J Resnick L Busso M E So A G Downey K M Romero D L Thomas R C Aristoff P A Tarpley W G Reusser F The benzylthio pyrimidine U 31 355 a potent inhibitor of HIV 1 reverse transcriptase Biochem Pharmacol 1996 51 6 743 50 Chou K C Kezdy F J Reusser F Review Steady state inhibition kinetics of processive nucleic acid polymerases and nucleases Anal Biochem 1994 221 217 230 Qi X Q Wen J Qi
72. tales como diferentes tipos de c ncer lesi n renal aterosclerosis diabetes isquemia intestinal e infecciones parasitarias Por lo tanto los m todos computacionales que 66 pueden predecir LIBPs basado en par metros de la estructura 3D se convirtieron en un objetivo de gran importancia para el descubrimiento de f rmacos y sus dianas moleculares y para el dise o de vacunas y la selecci n de biomarcadores El banco de datos de prote nas PDB contiene 3000 estructuras 3D de prote nas con funci n desconocida Esta lista as como los ltimos resultados experimentales en la investigaci n prote mica es una fuente muy interesante para descubrir prote nas relevantes incluyendo LIBPs Sin embargo no hay modelos generales para predecir nuevos LIBPs basados en estructuras 3D Se han desarrollado nuevos modelos de relaciones cuantitativas estructura actividad QSAR en base a los par metros electrost ticos 3D utilizando 1801 prote nas diferentes incluyendo 801 LIBPs Se calcularon los par metros electrost ticos con la herramienta MARCH INSIDE que se corresponden con la prote na entera O con regiones espec ficas de las prote nas n cleo interna media y superficie core inner middle surface Se utilizan estos par metros como entradas para alimentar a un clasificador de an lisis discriminante lineal Linear Discriminant Analysis LDA que discriminar las estructuras 3D de los LIBPs de nuevas prote nas Se implementa este pre
73. vi Best subsets Unless we specify a different value we always set a prior probability of p pPPI p npPPI 0 5 The LDA discriminant equation was obtained using as input the three types of PPI invariants 6 R The general form of the equation obtained by LDA is 5 5 3 S pPPC Y agii 0 R ao 7 Rkti S pPPC the output of this model is a real value variable that scores the propensity of a protein pair to undergo a pPPI interaction and not npPPIs forming a physically stable PPCs only in Plasmodium sp The x and p level value were examined in order to test the statistical significance of the model The Accuracy Specificity Sensitivity were used to quantify the goodness of fit and the discriminatory power of the model Different authors like have applied this type of LDA model using different classes of input variables to construct QSAR models for proteins or nucleic acids 77 80 2 3 CT models CTs have been used to test a non linear model which is not based on assumptions of parametric distribution of data as well as non linear models 81 We used as Ordered Predictors the vari ables obtained in the Forward stepwise of the LDA Starting from now on several split methods were carried out i CT Discriminant based Linear Combinations CT LC ii Discriminant based univar iate splits CT US and CRT style exhaustive search from univariate splits CRT In CRT we used three different measures of Goodness of fit Gin
74. y efectos 25 08 2008 Hora 11 30 En Santiago de Compostela a diecisiete de septiembre de dos mil ocho 41 2 1 3 CULSPIN ndices topol gicos del grafo tipo espiral 2 1 3 1 Art culos publicados con grafos de tipo espiral 2 1 3 1 1 Clasificacion cualitativa entre la estructura de las prote nas y el c ncer colorrectal utilizando las entrop as tipo Shannon del grafo estrella y los m todos Naive Bayes Naive Bayes QSDR classification based on spiral graph Shannon entropies for protein biomarkers in human colon cancer Molecular BioSystems 8 1716 1722 2012 Vanessa Aguiar Pulido Cristian Robert Munteanu Jos A Seoane Enrique Fern ndez Blanco L zaro G P rez Montoto Humberto Gonz lez D az Julian Dorado Enlace http goo gl JOOIE El diagn stico r pido del c ncer representa una necesidad real en la medicina aplicada debido a la importancia de esta enfermedad Los modelos te ricos pueden ayudar como herramientas de predicci n La representaci n teor a de grafos es una opci n ya que nos permite describir num ricamente cualquier sistema real como las macromol culas proteicas mediante la transformaci n de propiedades reales en ndices topol gicos de gr fos moleculares Este estudio propone un nuevo modelo de clasificaci n para las prote nas relacionadas con el c ncer de colon humano mediante el uso de los ndices topol gicos del gr fo tipo espiral sobre las secuencias de amino cidos de prot
75. 0 0 5 41 9 False positive rate 1 Specificity Fig 4 AUROC of Naive Bayes for HCC 1720 Mol BioSyst 2012 8 1716 1722 TP true positive cases correct diagnosis FP false positive cases over diagnosis TN true negative cases correct diagnosis FN false negative cases missed cases Se sensitivity Sp specificity PPV positive predictive value NPV negative predictive value LR like lihood ratio DOR diagnostic odds ratio Values as percentage and 95 of confidence interval 95 CI Values as ratio value Finally there is a great difference in terms of DOR Therefore it is better to consider a cut off of 0 5 This model obtains a great diagnostic capacity for both cut offs In this sense LR is gt 6 for both cut offs however LR is lt 1 These results confirm that the model developed here allows diagnosing the absence of HCC Conclusion This study proposes a new classification model for HCC using the spiral graph TIs of the protein amino acid sequences The best model based on only 11 Shannon entropy TIs and obtained with the Naive Bayes method proves the excellent predictive ability 90 92 for new proteins linked with HCC Previous works have proposed different models for HCC based on topological indices of star and lattice graphs for the same dataset The star graph based study proposed an input coded multi target classification model for two types of cancer human breast cancer HBC a
76. 000 20UI A Oxidoreductase 99 9969 3EOE B Transferase 100 0000 1Y9A C Oxidoreductase 99 9659 3EOE A Transferase 100 0000 1Y9A A Oxidoreductase 99 9631 3EOE C Transferase 100 0000 10F9 A Toxin 99 6656 3GG8 B Transferase 100 0000 1M6J B Isomerase 94 7758 3GG8 D Transferase 100 0000 1M6J A Isomerase 94 2867 2JH1 A Cell adhesion 99 9999 3EMU A Hydrolase 88 6560 2AA0 A Transferase 99 9976 Giardia intestinalis Trypanosoma brucei 2FFL B Hydrolase 100 0000 1H6Z A Transferase 100 0000 2QVW A Hydrolase 100 0000 3F5M C Transferase 100 0000 2QVW B Hydrolase 100 0000 3F5M A Transferase 100 0000 2FFL A Hydrolase 100 0000 1PGJ B Oxidoreductase 100 0000 2FFL G Hydrolase 100 0000 1PGJ A Oxidoreductase 100 0000 2QVW C Hydrolase 100 0000 3F5M B Transferase 100 0000 2FFL D Hydrolase 100 0000 3F5M D Transferase 100 0000 2QVW D Hydrolase 100 0000 2HIG A Transferase 99 9993 2112 A Metal binding 99 9239 2HIG B Transferase 99 9989 3GAY A Lyase 99 5204 1YAR U Hydrolase 99 9985 Leishmania major Fasciola Hepatica 20EF A Transferase 100 0000 206X A Hydrolase 99 9111 2VOB A Ligase 100 0000 2VIM A Oxidoreductase 85 5820 2VOB B Ligase 100 0000 2FHE B Transferase 72 2195 20EG A Transferase 100 0000 2FHE A Transferase 71 8592 3G1U B Hydrolase 100 0000 1FHE A Transferase 70 4894 3G1U D Hydrolase 99 9999 2FHE H Transferase 0 25620 3HJC A Chaperone 99 9999 2FHE G Transferase 0 25620 3G1U A Hydrolase 99 9999 2VPM A Ligase 99 9997 10KG A Transferase 99 9996 Note Prob probabil
77. 000 01 0 03 0 05 0 06 0 08 0 10 0 12 0 14 0 16 0 18 Leverages o Residuals LOO Residuals Figure 5 LDA model domain analysis ENTAMOEBA 11 FASCIOLA 1 ASCARIS 5 i Ps GIARDIA 21 MARA iw e Ye TRYPANOSOMA BRUCEI 345 A gt 1 B B 3 f Figure 6 Predicted ATCUN protein chains by parasite family ROC curve analysis tested whether the model behave as a random classifier or not Random classifiers may be plotted as in a straight line ROC curve with a 45 slope and an area under the curve equal to 0 5 Conversely nonrandom classifiers are statistically significant models with an area under the curve above 1 As it can be noted in Figure 3 our model behaves clearly as a not random statistically significant classifier with an area under the curve of 0 92 8 Due to the robustness of the LDA multivariate statistical techniques the predictive ability and interference reached by using the final model should not be affected see Figure 4 The linear relationship between the leave one out LOO residuals and the standardized raw residuals illustrate the high stability of the model to data variation Finally we have studied 5224 Journal of Proteome Research e Vol 8 No 11 2009 LEISHMANIA 152 ORECTOLOBUS 4 ch c PC d PLASMODIUM 173 7 Zh pn A Ve A n ps NT 3 the Domain of Applicability DA of the model due to the natural limitations inherent to QSAR models caused by data conformation DA may b
78. 08 254 2 476 82 Munteanu C R Gonzalez Diaz H Borges F de Magalhaes A L Natural random protein classification models based on star network topological indices J Theor Biol 2008 254 4 775 83 Xiao X Chou K C Digital coding of amino acids based on hydrophobic index Protein Pept Lett 2007 14 9 871 5 Xiao X Shao S Ding Y Huang Z Chou K C Using cellular automata images and pseudo amino acid composition to predict protein subcellular location Amino Acids 2006 30 1 49 54 Nair R Rost B LOC3D annotate sub cellular localization for protein structures Nucleic Acids Res 2003 31 13 3337 40 Chou K C Review Applications of graph theory to enzyme kinetics and protein folding kinetics Steady and non steady state systems Biophys Chem 1990 35 1 24 Chou K C Graphical rules in steady and non steady enzyme kinetics J Biol Chem 1989 264 12074 12079 Chou K C Forsen S Graphical rules for enzyme catalyzed rate laws Biochem J 1980 187 829 835 Chou K C Liu W M Graphical rules for non steady state enzyme kinetics J Theor Biol 1981 91 4 637 54 Kuzmic P Ng K Y Heath T D Mixtures of tight binding enzyme inhibitors Kinetic analysis by a recursive rate equation Anal Biochem 1992 200 68 73 Althaus I W Chou J J Gonzales A J Diebel M R Chou K C Kezdy F J Romero D L Aristoff P A Tarpley W G Reusser F Steady
79. 0827 16 22657 56 1MISA 34 33 12822 2 This journal is The Royal Society of Chemistry 2012 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A View Online Putates conserved domana barro been detected cick on the mage below lor Getaded reseitu H m diua ge SR ete Mp ed ees Fete ein e 1 Tageer f ani des Pult ima ds q LU sy EL ma e nm E Tseprereid Bosa CY portas ly Dirbunon of 100 Blast Hits on fe Query Sequence i Mouser ip show define and scores cick to show akgnments Color key for alignment scores Fig 5 BLAST analysis a homodimer to be exact with a total molecular weight of M 86659 06 The importance of the study of proteins in this parasite is due to the fact that Cryptosporidiosis is a neglected disease without a wholly effective drug That is why Artz et al 2 presented a study involving this protein in which they demon strated that nitrogen containing bisphosphonates N BPs are capable of inhibiting C parvum at low micromolar concentra tions in infected MDCK cells Predictably the mechanism of action is based on the inhibition of biosynthesis of isoprenoids but this target protein is unexpectedly a distinctive C parvum enzyme that dubbed nonspecific polyprenyl pyrophosphate synthase CpNPPPS It is part of an isoprenoid pathway in Cryptosporidium distinctly different from other orga
80. 1 Randic connectivity index X 1X gt gt mi sart deg x deg 12 These TIs and other derivate ones will be used in the next step to construct an antioxidant non antioxidant classification model using machine learning methods 2 3 Random Forest Random Forest was first proposed by Breiman 2001 This technique combines many decision trees to make a prediction giving as output the class that is the mode of the classes output by 334 E Fern ndez Blanco et al Journal of Theoretical Biology 317 2013 331 337 individual trees Thus this technique can be considered an ensemble learning technique since it uses multiple models to obtain better predictive performance These decision trees are constructed by means of bagging classification trees Breiman 1996 where each tree is constructed independently based on a random sample and a majority vote of the trees is taken as prediction Random Forest adds an extra random layer to bagging Normally decision trees are built from a random sample and nodes are split by the best among a subset of predictors randomly chosen at that node The main advantage of Random Forest over other techniques such as Artificial Neural Networks Support Vector Machines Linear Discriminant Analysis etc is the robustness of this tech nique regarding solution overfitting tending to converge always when the number of trees is large The typical Random Forest algorithm is composed of three step
81. 115 116 273 Ramos de Armas R Gonz lez D az H Molina R Perez Gonzalez M Uriarte E Bioorg Med Chem 2004 12 18 4815 22 Ramos de Armas R Gonz lez D az H Molina R Uriarte E Biopolymers 2005 77 5 247 56 Hill T Lewicki P Statistics methods and applications A comprehensive reference for science industry and data mining Tulsa StatSoft 2006 Ivanisenko VA Pintus SS Grigorovich DA Kolchanov NA Nucleic Acids Res 2005 33 Database issue D183 7 Dobson PD Doig AJ J Mol Biol 2003 330 4 771 83 Chou KC J Proteome Res 2005 4 4 1413 8 Chou KC Elrod DW J Proteome Res 2003 2 2 183 90 Chou KC Shen HB J Proteome Res 2006 5 1888 97 Chou KC Shen HB J Proteome Res 2006 5 3420 8 Chou KC Shen HB J Proteome Res 2007 6 1728 34 Chou KC Cai YD J Proteome Res 2006 5 2 316 22 Chou KC Elrod DW J Proteome Res 2002 1 5 429 33 Fern ndez M Caballero F Fern ndez L Abreu Jl Acosta G Proteins 2008 70 1 167 75 Caballero J Fernandez M Curr Top Med Chem 2008 8 18 1580 605 Fern ndez L Caballero J Abreu JI Fern ndez M Proteins 2007 67 834 52 Guha R Jurs PC J Chem Inf Comput Sci 2004 44 6 2179 89 Van Waterbeemd H Discriminant analysis for activity prediction In Van Waterbeemd H editor Chemometric methods in molecular design vol 2 New York NY Wiley VCH 1995 p 265 82 Garcia Garcia A Galvez J de Julian Ortiz JV Garcia Domenech R Munoz C Guna R et al J Biomol Screen 2005 10 3
82. 12 PlasmodPPI calc txt Calculated at 2009 11 04 10 25 19 Chain1 Chain2 Complex 3C5IA 3C5IE YES 2F6IE 2GHUA NO 1SYRC 1SYRF YES Fig 7 Example of use of PlasmodPPI web tool A Input and B Output pages 272 Y Rodriguez Soca et al Polymer 51 2010 264 273 predicts how unique is a protein protein complex in Plasmodium proteome with respect to other parasites and hosts breaking new ground for anti plasmodium drug target discovery In order to demonstrate the practical utility of this Web server three examples of protein chain pairs have been used to evaluate the possibility to make up unique complexes in Plasmodium a human pathogen parasite 3C5IA 3C5IE 2F6IE 2GHUA and 1SYRC 1SYRF Fig 7 presents the input A and output B web pages of the PlasmodPPI tool The first pair contains the first chain A of the Plasmodium knowlesi choline kinase a transferase 3C51 and the cleaved fragment of N terminal expression tag chain E all expressed in Escherichia coli Choline kinase is the first enzyme in the Kennedy pathway CDP choline pathway for the biosynthesis of the most essential phospholipid phosphatidylcholine in Plas modium In addition choline kinase also plays a pivotal role in trapping essential polar head group choline inside the malaria parasite The inhibition of choline kinase will lead to a decrease in phosphocholine which in turn causes a decrease in phosphatidyl choline biosynthesis resulting in death of th
83. 206 14 Garcia Garcia A Galvez J de Julian Ortiz JV Garcia Domenech R Munoz C Guna R et al J Antimicrob Chemother 2004 53 1 65 73 Gozalbes R Brun Pascaud M Garcia Domenech R Galvez J Pierre Marie G Jean Pierre D et al Antimicrobial Agents Chemother 2000 44 10 2771 6 Gozalbes R Galvez J Garcia Domenech R Derouin F SAR QSAR Environ Res 1999 10 1 47 60 Marrero Ponce Y Meneses Marcel A Rivera Borroto OM Garcia Domenech R De Julian Ortiz JV Montero A et al J Comput Aided Mol Des 2008 22 8 523 40 Marrero Ponce Y Ortega Broche SE Diaz YE Alvarado YJ Cubillan N Cardoso GC et al J Theor Biol 2009 259 2 229 41 Marrero Ponce Y Castillo Garit JA Nodarse D Bioorg Med Chem 2005 13 10 3397 404 Marrero Ponce Y J Chem Inf Comput Sci 2004 44 6 2010 26 Fernandez M Caballero J Tundidor Camba A Bioorg Med Chem 2006 14 12 4137 50 Rabow AA Scheraga HA J Mol Biol 1993 232 4 1157 68 Hill T Lewicki P Statistics methods and applications Tulsa StatSoft 2006 Xu T Du L Zhou Y BMC Bioinformatics 2008 9 472 Mahdavi MA Lin YH Genomics Proteomics Bioinformatics 2007 5 3 4 177 86 Feldesman MR Am J Phys Anthropol 2002 119 3 257 75 Schlessinger A Yachdav G Rost B Bioinformatics 2006 22 7 891 3 Mewes HW Frishman D Mayer KF Munsterkotter M Noubibou O Pagel P et al Nucleic Acids Res 2006 34 Database issue D169 172 Xie D Li A Wang M Fan Z Feng H Nucleic Acids Res 2005 33 Web Serve
84. 21 940 920 725 92 6 0 961 JRip 63 9 97 0 91 6 80 2 91 2 0 815 Random 79 0 943 91 8 72 9 92 2 0 867 tree Random 79 9 96 1 935 79 9 93 5 0 95 Forest Naive 77 5 55 8 59 3 25 3 81 8 0 772 bayes MLP 0 100 838 0 83 8 0 644 K star 7712 94 2 90 6 70 7 90 7 0 946 JRip 67 0 96 7 919 79 8 91 5 0 840 Random 821 94 9 92 8 75 6 93 1 0 885 tree Random 82 1 96 1 93 8 80 4 93 9 0 948 Forest E Fern ndez Blanco et al Journal of Theoretical Biology 317 2013 331 337 335 In order to reduce the noise and to improve the classification scores the data used as input has been divided into three subsets depending on the nature of the attributes e A subset named Sh which includes the attributes related with the entropy of the embedded and non embedded Graph e A subset named Tr which includes the attributes related with the traces of the embedded and non embedded Graph e Anda subset named X which includes the attributes related with the polygon indexes to represent the subspaces in the graph Table 2 shows the result of this division It should be high lighted that not all of the original attributes have been included in one of these three subsets more specifically some attributes regarding the general shape of the graphs were not included in any of these subsets The different methods were then tested using each of these subsets as well as their combination in order to find the best possible one Results of these tests are shown in
85. 3 2622 45 C R Munteanu A L Magalhaes E Uriarte and H Gonzalez Diaz J Theor Biol 2009 257 303 311 46 M Randic N Lers D Plavsic S Basak and A T Balaban Chem Phys Lett 2005 407 205 208 47 A Y Ngand M I Jordan Adv Neural Inf Process Syst 2002 2 841 848 48 M Cruz Monteagudo C R Munteanu F Borges M N Cordeiro E Uriarte and H Gonzalez Diaz Bioorg Med Chem 2008 16 9684 9693 49 M Cruz Monteagudo C R Munteanu F Borges M N Cordeiro E Uriarte K C Chou and H Gonz lez Diaz Polymer 2008 49 5575 5587 Es Mol BioSyst 2012 8 1716 1722 1721 50 51 52 53 54 55 56 37 58 59 60 70 7 72 M Cruz Monteagudo H Gonzalez Diaz F Borges E R Dominguez and M N Cordeiro Chem Res Toxicol 2008 21 619 632 P Mitra and D Pal Structure 2011 19 304 312 C Jackson E Glory Afshar R F Murphy and J Kovacevic Bioinformatics Oxford England 2011 27 1854 1859 A A Freitas O Vasieva and J P de Magalhaes BMC Genomics 2011 12 27 C Xing and D B Dunson PLoS Comput Biol 2011 7 e1002110 Y Xu W Hu Z Chang H Duanmu S Zhang Z Li Z Li L Yu and X Li J R Soc Interface 2011 8 555 567 W Wei S Visweswaran and G F Cooper J Am Med Inf Assoc 2011 18 370 375 A Bender Methods Mol Biol Totowa N J 2011 672 175 196 L G P rez Montoto F
86. 4 379 406 Khan MT Predictions of the ADMET properties of candidate drug molecules utilizing different QSAR QSPR modelling approaches Curr Drug Metab 2010 11 4 285 95 Martinez Romero M Vazquez Naya JM Rabunal JR Pita Fernandez S Macenlle R Castro Alvarino J et al Artificial intelligence techniques for colorectal cancer drug metabolism ontology and complex network Curr Drug Metab 2010 11 4 347 68 Mrabet Y Semmar N Mathematical methods to analysis of topology functional variability and evolution of metabolic systems based on different decomposition concepts Curr Drug Metab 2010 11 4 315 41 Wang JF Chou KC Molecular modeling of cytochrome P450 and drug metabolism Curr Drug Metab 2010 11 4 342 6 Zhong WZ Zhan J Kang P Yamazaki S Gender specific drug metabolism of PF 02341066 in rats role of sulfoconjugation Curr Drug Metab 2010 11 4 296 306 Gonzalez Diaz H QSAR and Complex Networks in Pharmaceutical Design Microbiology Parasitology Toxicology Cancer and Neurosciences Curr Pharm Des 2010 16 24 2598 600 Speck Planche A Scotti MT de Paulo Emerenciano V Current pharmaceutical design of antituberculosis drugs future perspectives Curr Pharm Des 2010 16 24 2656 65 Garcia I Fall Y Gomez G QSAR Docking and CoMFA Studies of GSK3 Inhibitors Curr Pharm Des 2010 16 24 2666 75 Estrada E Molina E Nodarse D Uriarte E Structural Contributions of Substrates to their Binding to P Glycoprotein
87. 403 10 Dobson P D Doig A J Predicting enzyme class from protein structure without alignments J Mol Biol 2005 345 1 187 99 Di Cera E Thrombin a paradigm for enzymes allosterically activated by monovalent cations C R Biol 2004 327 12 1065 76 Nayal M Di Cera E Valence screening of water in protein crystals reveals potential Na binding sites J Mol Biol 1996 256 2 228 34 PR900556G Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A Molecular BioSystems Cite this Mol BioSyst 2012 8 851 862 www rsc org molecularbiosystems View Online Journal Homepage Table of Contents for this issue Dynamic Article Links Y PAPER LIBP Pred web server for lipid binding proteins using structural network parameters PDB mining of human cancer biomarkers and drug targets in parasites and bacteria Humberto Gonz lez D az Cristian R Munteanu Lucian Postelnicu Francisco Prado Prado Marcos Gestal and Alejandro Pazos Received 19th October 2011 Accepted Ist December 2011 DOLI 10 1039 c2mb05432a Lipid Binding Proteins LIBPs or Fatty Acid Binding Proteins FABPs play an important role in many diseases such as different types of cancer kidney injury atherosclerosis diabetes intestinal ischemia and parasitic infections Thus the computational methods that can predict LIBPs based on 3D struct
88. 6 100 0 0 64 1000 0 280 total 92 0 89 2 91 3 Electrostatic potential n 265 ATCUN 90 0 90 10 969 32 1 91 7 122 11 Nonactive C 92 9 7 92 87 8 4 29 91 6 11 121 total 91 5 92 4 91 7 is 84 The misclassified proteins can be explained by the fact that the biological activity of proteins is determined by several forces such as the hydrophobic ones These proteins are not a representative percentage only 8 6 of the entire database 36 out of 415 The coefficients of our best model eq 5 are standardized and permit comparison and interpretation of the participation of each protein region in the biological activity Thus our best model allocates positive contributions of 0 36 to the ATCUN mediated DNA cleavage activity for unitary increment in the total amount of electrostatic spectral moments m z c meli m m mo The catalytic nature of the metal Cu ID Ni ID cluster is explained by the contribution of all the Parasite Protein ATCUN DNA Cleavage Model 35 30 25 20 15 T t research articles 1 0 8 0 6 EJ 0 4 40 20 0 40 60 80 IB 0 2 EH o 1 0 2 Toli Figure 2 Activity desirability analysis for the classification model variables orbits or regions 14 12 10 1 0 0 9 0 8 VQ e og 0 6 rd Pd git 07 Area under ROC 0 904 7 0 4 a 0 6 Ld i E gt 02 QU 9 05 E g 00 04 Q af O 0 2 03 of 0 4 d 02 0 6 0 1 00
89. 75 to 100 for region s Additionally we consider the total region f that contains all the amino acids in the protein region diameter 0 to 100 of rmax Consequently we can calculate different z f and 0 f for the amino acids contained in a region c i m s or f and placed at a topological distance k within this region k is the name of the order 5 In this work we calculated a total of 90 indices 3 types of indices x 5 types of regions x 6 higher order considered for each protein LDA model Linear Discriminant Analysis LDA is frequently used for classification prediction problems in physical anthro pology but it is unusual to find examples in which researchers consider the statistical limitations and assumptions required for this technique In this work all LDA models have been trained with the STATISTICA 6 0 software for which our laboratory holds rights of use In LDA we use several variable selection techniques to seek the model i all effects include all parameters 11 forward stepwise ii forward entry iv backward stepwise v backward removal and vi best subsets Unless we specify a different value we always set a prior probability of p LIBP p nLIBP 0 5 The LDA discriminant equation was obtained using as input the three types of Markov chain invariants 0 R The general form of the equation obtained by LDA is tA 53 ag xe R bg 0 R cg R d 4 li S LIBP R This
90. 9 102 65 6 63 61 oo 59 88 S 103 66 7 SB H B HR H 7 I se E I I l 1 4i 3 n E 5s 5 1 106 69 40 19 6 1 2 n 28 1 107 7 41 20 72 5 90 7 5 I I 108 n al u n 3 4 35 26 I 1 i I 109 n 43 4 4 46 7 8 49 BD Figura 23 Espiral cuadrada con los datos Qu es un gnomon La espiral de Ulam puede dividirse en diferentes regiones o intervalos nombrados gn mones o disposiciones angulares seg n se puede observar en la Figura 24 Para definir un gnomon es necesario recordar los n meros oblongos que son aquellos que se pueden representar mediante el producto n n 1 con n natural es decir 2 6 12 20 30 42 56 72 90 Estos n meros dividen a los n meros naturales en intervalos crecientes en longitud 2n Resulta f cil de ver que un par de n meros oblongos consecutivos definen un gnomon y que estas disposiciones angulares se van encajando dando lugar a rect ngulos de magnitud creciente Adem s queda claro que cada elemento de la espiral pertenece a un nico gnomon es por ello que se puede definir la coordenada U de un elemento en la espiral de Ulam como el n mero del gnomon al que pertenece Figura 24 Representaci n de n meros por gn mones de un grafo espiral de Ulam Cuando se representa una secuencia de letras en su Grafo U cada nodo es un elemento 53 de
91. 9 6 Parasite Protein ATCUN DNA Cleavage Model 4 dj ig d EA Ud gt 1 d Ej d d Pj n mm o 1 E i q Um P Um Ot Lim Qim d di Aim d m 1 m 1 im m 1 im m i gt di Oy 3 9 1 q 0 1 m gt Qt zx Opt Pm m 1 dim m 1 where q and q are the electronic charges for the aa and the j aa and the neighborhood relationship truncation function Qij 1 was turned on if these amino acids participate in a peptide hydrogen bond or dj lt deuot 5 A The distance d is the Euclidean distance between the C atoms of the two amino acids dy is the distance between the amino acid and the charge center of the protein All distances were obtained from the x y and z coordinates of the amino acids from the PDB files The MM was used to calculate average noninteracting 7 O short range 7 O middle range z O and long range elec trostatic interaction potentials zz O when k gt 2 for different protein regions called orbits in the 415 proteins The 3D space of the protein was imaginary spliced into four regions or orbits such as the core c the inner 7 the middle m and the outer o The core orbit is an sphere that contains all the amino acids having the orbit ratio r lt 25 r d j d j m 100 d j is the distance from the C of the amino acid j to the center of the protein and d j max represents the larger distance for a C in the protein The inner orbit is described by 25 x r lt 50 the midd
92. 9 For the calculation the MARCH INSIDE software divided the protein into four orbits R called c i mand s that constitute specific groups or collections of amino acids placed at the protein core o inner 1 middle m or surface region s see Figure 1 The diameters of the orbits as a percentage of the longer distance with respect to the center of charge are 0 25 for orbit c 25 1 50 for orbit i 50 1 75 for orbit m and 75 1 100 for orbit s Figure 2 presents the flowchart of the present method Artificial Neural Network ANN Analysis Artificial neural networks ANN have been used to test a linear model not based on assumptions of parametric distribution of data and nonlinear models as well The ANNs have been trained with the software STATISTICA 6 0 for which our laboratory holds rights of use The classification problem was solved with the Intelligent Problem Solver analysis by using a selection of a subset of the independent variables The retained networks were selected by using the balance performance against diversity Several types of ANNs have been tested such as test of the linear ANN LNN probabilistic neural network PNN general regression neural network GRNN radial basis functions RBE and the three and four layer perceptron Multi Layer Perceptron MLP The number of tested hidden units had the values of 1 1967 for RBF and 1 10 for the layer 2 of the three layer MLP and layers 2 and 3 of the fo
93. A TOPS MODE Approach Curr Pharm Des 2010 16 24 2676 709 Concu R Podda G Ubeira FM Gonzalez Diaz H Review of QSAR Models for Enzyme Classes of Drug Targets Theoretical Background and Applications in Parasites Hosts and other Organisms Curr Pharm Des 2010 16 24 2710 23 Vazquez Naya JM Martinez Romero M Porto Pazos AB Novoa F Valladares Ayerbes M Pereira J et al Ontologies of drug discovery and design for neurology cardiology and oncology Curr Pharm Des 2010 16 24 2724 36 Gonzalez Diaz H Romaris F Duardo Sanchez A Perez Montoto LG Prado Prado F Patlewicz G et al Predicting drugs and proteins in parasite infections with topological indices of complex networks theoretical backgrounds applications and legal issues Curr Pharm Des 2010 16 24 2737 64 Marrero Ponce Y Casanola Martin GM Khan MT Torrens F Rescigno A Abad C Ligand Based Computer Aided Discovery of Tyrosinase Inhibitors Applications of the TOMOCOMD CARDD Method to the Elucidation of New Compounds Curr Pharm Des 2010 16 24 2601 24 Roy K Ghosh G Exploring QSARs with Extended Topochemical Atom ETA Indices for Modeling Chemical and Drug Toxicity Curr Pharm Des 2010 16 24 2625 39 Munteanu CR Fernandez Blanco E Seoane JA Izquierdo Novo P Rodriguez Fernandez JA Prieto Gonzalez JM et al Drug discovery and design for complex diseases through QSAR computational methods Curr Pharm Des 2010 16 24 2640 55 71 39 40 41
94. CND SRA IVSTLAAIGTGFDCASKTEIQLVQGLGVPAERVIYANPCKQVSQIKYAASNGVQMMTFDSEIELMKVA RAHPKAKLVLRIATKFGATLKTSRLLLERAKELNIDVIGVSFHVGSGCTDPDTFVQAVSDARCVFDMA TEVGFSMHLLDIGGGFPGSEDTKLKFEEITSVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNI IAKKTVWEQTFMYYVNDGVYGSF NCILYDHAHVKALLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCN LPEMHVGDWMLFENMGAYTVAAASTFNGFQRPNIYYVMSRPMWQLMK 5 94277935927 6 20302610201 6 15115323348 6 20767598017 6 18925683378 6 21090446699 388 0 0 101 033333333 1 48125 46 8350462963 2 29915123457 2149 32953498 9735114 0 64058 5120342 75274220 0 77495217 4044 199 402675045 192 710325405 184 618394946 176 731552801 168 771638841 160 909639012 Figura 17 Ejemplo de resultados embedded con la S2SNet indices topol gicos y dibujos de los grafos de tipo estrella El procesamiento de las secuencias se puede ver en una ventana con la consola Si se cierra todas las ventanas de la aplicaci n se cerrar n tambi n Los botones se pueden encontrar 37 tambi n en el men sin el Display Adem s desde el men se puede abrir el editor de texto Bloc de notas si necesita ver editar sus archivos de entrada salida o crear otros nuevos En el dibujo de los grafos cada grupo tiene un color diferente Si se quiere obtener dibujos diferentes se pueden encontrar los archivos DOT para cada secuencia y los ejecutables del Graphviz dot circo twopi neato fdp en la carpeta dot El ment Calculations permite transformar sus datos en el formato S2
95. CUNpred Trypanosome Protein Protein Plasmodium Protein Protein Interactions Enzyme Class Prediction ATCUN DNA cleavage protein activity E Prediction Interactions MIND BEST NL MIND BEST Non Linear MARCH INSIDE Nested Drug Bank Exploration amp Screening Tool MISSProt HP MARCH INSIDE Spectral moment prediction of Self Proteins in Human Parasites other than original source organism LIBPpred LIpid Binding Proteins Prediction Linear MARCH INSIDE Nested Drug Bank Exploration amp Screening Tool v A LectinPred Lectin Prediction Figura 27 El portal online Bio AIMS TargetPred secci n con las nuevas herramientas inform ticas Bio AIMS http bi0 aims udc es es una colecci n de servidores online que ofrece modelos te ricos basados en la Inteligencia Artificial Biolog a Computacional y Bioinform tica para estudiar sistemas complejos en ciencias micas gen mica transcript mica metabol mica react mica que son relevantes en Parasitolog a Microbiolog a el c ncer neurociencias enfermedades cardiovasculares y otras investigaciones biom dicas en general Los modelos se basan en los programas de ordenador MARCH INSIDE MInD Prot S2SNet y MCeCoNet Es el resultado de la colaboraci n de dos grupos de la Red Gallega de Bioinform tica RGB 57 Departamento de Tecnolog as de la Informaci n y las Comunicaciones TIC Facultad de Inform tica Universidad de A Coru a UDC
96. Chem Phys Lett 2007 443 408 13 Ag ero Chapin G Gonzalez Diaz H Molina R Varona Santos J Uriarte E Gonzalez Diaz Y Novel 2D maps and coupling numbers for protein sequences The first QSAR study of polyga lacturonases isolation and prediction of a novel sequence from Psidium guajava L FEBS Lett 2006 580 723 30 Krishnan A Giuliani A Zbilut J P Tomita M Network scaling invariants help to elucidate basic topological principles of proteins J Proteome Res 2007 6 10 3924 34 Gonz lez D az H Sanchez Gonzalez A Gonzalez Diaz Y 3D QSAR study for DNA cleavage proteins with a potential anti tumor ATCUN like motif J Inorg Biochem 2006 100 7 1290 7 Gonz lez D az H Bonet L Ter n C de Clercq E Bello R Garc a M Santana L Uriarte E ANN QSAR model for selection of anticancer leads from structurally heterogeneous series of compounds Eur J Med Chem 2007 42 580 5 Prado Prado F J Gonz lez D az H Martinez de la Vega O Ubeira F M Chou K C Unified QSAR approach to antimicrobi als Part 3 First multi tasking QSAR model for Input Coded prediction structural back projection and complex networks clustering of antiprotozoal compounds Bioorg Med Chem 2008 16 5871 80 Munteanu C R Gonzalez Diaz H Magalhaes A L Enzymes non enzymes classification model complexity based on composi tion sequence 3D and topological indices J Theor Biol 20
97. F Levin M J Protein protein interaction map of the Trypanosoma cruzi ribosomal P protein complex Gene 2005 357 2 129 36 Caro F Bercovich N Atorrasagasti C Levin M J Vazquez M P Protein interactions within the TcZFP zinc finger family members of Trypanosoma cruzi implications for their functions Biochem Biophys Res Commun 2005 333 3 1017 25 Choe J Moyersoen J Roach C Carter T L Fan E Michels P A Hol W G Analysis of the sequence motifs responsible for the interactions of peroxins 14 and 5 which are involved in glycosome biogenesis in Trypanosoma brucei Biochemistry 2003 42 37 10915 22 6 Chou K C Cai Y D Predicting protein protein interactions from sequences in a hybridization space J Proteome Res 2006 5 2 316 22 7 Gonz lez D az H Gonz lez D az Y Santana L Ubeira F M Uriarte E Proteomics networks and connectivity indices Pro teomics 2008 8 750 778 8 Wu J Mellor J C DeLisi C Deciphering protein network organization using phylogenetic profile groups Genome Inform Ser Workshop Genome Inform 2005 16 1 142 9 9 McDermott J Samudrala R Enhanced functional information from predicted protein networks Trends Biotechnol 2004 22 2 60 2 discussion 62 3 Huynen M A Snel B von Mering C Bork P Function prediction and protein networks Curr Opin Cell Biol 2003 15 2 191 8 Jeong H Mason S P Barab
98. Ga Wheinheim 2003 Todeschini R Consonni V Handbook of Molecular Descriptors Wiley VCH 2002 Mauri A Consonni V Pavan M Todeschini R DRAGON Software An Easy Approach to Molecular Descriptor Calculations MATCH communications in mathematical and in computer chemistry 2006 56 237 48 Tetko IV Gasteiger J Todeschini R Mauri A Livingstone D Ertl P et al Virtual computational chemistry laboratory design and description J Comput Aided Mol Des 2005 19 453 63 12 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 Ponce YM Total and local atom and atom type molecular quadratic indices significance interpretation comparison to other molecular descriptors and QSPR OSAR applications Bioorg Med Chem 2004 12 24 6351 69 Casanola Martin GM Marrero Ponce Y Khan MT Ather A Khan KM Torrens F et al Dragon method for finding novel tyrosinase inhibitors Biosilico identification and experimental in vitro assays Eur J Med Chem 2007 42 11 12 1370 81 Perez Garrido A Helguera AM Rodriguez FG Cordeiro MN QSAR models to predict mutagenicity of acrylates methacrylates and alpha beta unsaturated carbonyl compounds Dent Mater 26 5 397 415 Estrada E Quincoces JA Patlewicz G Creating molecular diversity from antioxidants in Brazilian propolis Combination of TOPS MODE QSAR and virtual structure generation Mol Divers 2004 8 1 21 33 Cabr
99. HW Wong L J Bioinform Comput Biol 2008 6 3 435 66 35 Smith GR Sternberg MJ Curr Opin Struct Biol 2002 12 1 28 35 36 Shen HB Chou KC Anal Biochem 2008 373 2 386 8 37 Shen HB Chou KC Protein Eng Des Sel 2007 20 11 561 7 38 Chou KC Shen HB Biochem Biophys Res Commun 2007 doi 10 1016 j bbrc 2007 1006 1027 39 Chou KC Shen HB Nat Protoc 2008 3 2 153 62 40 Gonz lez D az H de Armas RR Molina R Bioinformatics 2003 19 16 2079 87 Gonz lez D az H Saiz Urra L Molina R Santana L Uriarte E J Proteome Res 2007 6 2 904 8 42 Gonzalez Diaz H Molina R Uriarte E FEBS Lett 2005 579 20 4297 301 43 Concu R Podda G Uriarte E Gonzalez Diaz H J Comput Chem 2009 30 1510 20 Gonzalez Diaz H Saiz Urra L Molina R Gonzalez Diaz Y Sanchez Gonzalez A J Comput Chem 2007 28 6 1042 8 45 Gonz lez D az H P rez Castillo Y Podda G Uriarte E J Comput Chem 2007 28 1990 5 3 6 24 41 44 46 47 48 49 50 51 52 70 71 72 73 74 75 76 77 78 Y Rodriguez Soca et al Polymer 51 2010 264 273 Santana L Uriarte E Gonz lez D az H Zagotto G Soto Otero R Mendez Alvarez E J Med Chem 2006 49 3 1149 56 Aguero Chapin G Varona Santos J de la Riva GA Antunes A Gonzalez Villa T Uriarte E et al J Proteome Res 2009 8 4 2122 8 Concu R Dea Ayuela MA Perez Montoto LG Bolas Fernandez F Prado Prado FJ Podda G et al
100. I A Web Server for Prediction of Unique Targets in Trypanosome Proteome by using Electrostatic Parameters of Protein protein Interactions Yamilet Rodriguez Soca Cristian R Munteanu Juli n Dorado Alejandro Pazos Francisco J Prado Prado and Humberto Gonz lez D az Department of Microbiology amp Parasitology Faculty of Pharmacy University of Santiago de Compostela 15782 Santiago de Compostela Spain and Department of Information and Communication Technologies Computer Science Faculty University of A Coru a Campus de Elvi a 15071 A Coru a Spain Received September 15 2009 Abstract Trypanosoma brucei causes African trypano somiasis in humans HAT or African sleeping sickness and Nagana in cattle The disease threatens over 60 million people and uncounted numbers of cattle in 36 countries of sub Saharan Africa and has a devastating impact on human health and the economy On the other hand Trypanosoma cruziis responsible in South America for Chagas disease which can cause acute illness and death especially in young children In this context the discovery of novel drug targets in Trypanosome proteome is a major focus for the scientific community Recently many researchers have spent important efforts on the study of protein protein interactions PPls in pathogen Trypanosome species concluding that the low sequence identities between some parasite proteins and their hu man host render these PPIs as highly pr
101. J Proteome Res 2009 8 9 4372 82 Santana L Gonzalez Diaz H Quezada E Uriarte E Yanez M Vina D et al J Med Chem 2008 51 21 6740 51 Vina D Uriarte E Orallo F Gonzalez Diaz H Mol Pharmacol 2009 6 3 825 35 Bornholdt S Schuster HG Handbook of graphs and complex networks from the genome to the internet Wheinheim WILEY VCH GmbH amp CO KGa 2003 Mazurie A Bonchev D Schwikowski B Buck GA Bioinformatics 2008 24 22 2579 85 Managbanag JR Witten TM Bonchev D Fox LA Tsuchiya M Kennedy BK et al PLoS One 2008 3 11 e3802 Witten TM Bonchev D Chem Biodivers 2007 4 11 2639 55 Bonchev D Buck GA J Chem Inf Model 2007 47 3 909 17 Bonchev D SAR QSAR Environ Res 2003 14 3 199 214 Estrada E J Proteome Res 2006 5 9 2177 84 Estrada E Proteomics 2006 6 1 35 40 Gupta N Mangal N Biswas S Proteins 2005 59 2 196 204 Webber Jr CL Giuliani A Zbilut JP Colosimo A Proteins 2001 44 3 292 303 Gobel U Sander C Schneider R Valencia A Proteins 1994 18 4 309 17 Krishnan A Zbilut JP Tomita M Giuliani A Curr Protein Pept Sci 2008 9 1 28 38 Krishnan A Giuliani A Zbilut JP Tomita M PLoS One 2008 3 5 e2149 Palumbo MC Colosimo A Giuliani A Farina L FEBS Lett 2007 581 13 2485 9 Krishnan A Giuliani A Zbilut JP Tomita M J Proteome Res 2007 6 10 3924 34 Krishnan A Giuliani A Tomita M PLoS ONE 2007 2 6 e562 Gonz lez D az H Gonz lez D az Y Santana L Ubeira FM Uriarte E Pro
102. KKAAKKAAAAAK is transformed into the spiral graph presented in Fig 3 Using this graph CULSPIN calculates two families of Topological Indices TIs frequencies Fr and Shannon entropies Sh These indices can be calculated at several levels for each class in each Ulam gnomon for each class in the whole graph and for each gnomon independently of the class type On the other hand the 2D graphs U graphs generated by the application besides being able to be visualized can be exported in order to use them in other external programs to calculate other families of TIs All the numeric indices can be saved and or exported to subject them later on to a great variety of statistical analyses or to create QSAR models quantitative structure activity relationship Examples of sequences are the amino acid chains in proteins nucleic acids and mass spectra of proteins CULSPIN can be used to study different systems from simple systems of atoms Fig 3 The spiral graph for the amino acid sequence 1718 Mol BioSyst 2012 8 1716 1722 in anti tumour small molecules until complex systems of metabolic social computational or biological nets The indices can be calculated with the following levels By classes in gnomons if this option is selected the two families of TIs are calculated for each one of the classes in each one of the gnomons In case a class is not present in a certain gnomon its frequency and its Shannon entropy in th
103. Mol Biol 2003 330 771 783 Z C Wu X Xiao and K C Chou Mol BioSyst 2011 7 3287 3297 X Xiao Z C Wu and K C Chou J Theor Biol 2011 284 42 51 X Xiao Z C Wu and K C Chou PLoS One 2011 6 6 e20592 M Perez Gonzalez and A Morales Helguera J Comput Aided Mol Des 2003 17 665 672 Y Marrero Ponce R Medina Marrero F Torrens Y Martinez V Romero Zaldivar and E A Castro Bioorg Med Chem 2005 13 2881 2899 Y Marrero Ponce A Montero Torres C R Zaldivar M I Veitia M M Perez and R N Sanchez Bioorg Med Chem 2005 13 1293 1304 H Gonz lez D az A Sanchez Gonzalez and Y Gonzalez Diaz J Inorg Biochem 2006 100 1290 1297 StatSoft Inc 6 0 edn 2002 H Van Waterbeemd in Method and Principles in Medicinal Chemistry ed R Manhnhold P Krogsgaard Larsen H Timmerman and H Van Waterbeemd Wiley VCH New York 1995 vol 2 pp 283 293 S Wu and Y Zhang Nucleic Acids Res 2007 35 3375 3382 H Gonzalez Diaz F Prado Prado X Garcia Mera N Alonso P Abeijon O Caamano M Yanez C R Munteanu A Pazos Mol BioSyst 2012 8 851 862 861 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A M A Dea Ayuela M T Gomez Munoz M M Garijo J Sansano and F M Ubeira J Proteome Res 2011 10 1698 1718 117 E Estrada E Uriarte A Montero M Teije
104. NA primary sequences and their numerical characterization J Chem Inf Comput Sci 2000 40 5 1235 44 Nandy A Basak S C Simple numerical descriptor for quantifying effect of toxic substances on DNA sequences J Chem Inf Comput Sci 2000 40 4 915 9 Nandy A Basak S C Gute B D Graphical representation and numerical characterization of H5N1 avian flu neuraminidase gene sequence J Chem Inf Model 2007 47 3 945 51 Liao B Wang T M Analysis of similarity dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases J Chem Inf Comput Sci 2004 44 5 1666 70 Liao B Ding K Graphical approach to analyzing DNA sequences J Comput Chem 2005 26 14 1519 23 Randic M Condensed representation of DNA primary sequences J Chem Inf Comput Sci 2000 40 1 50 6 Randi M Vra ko M Nandy A Basak S C On 3 D Graphical Representation of DNA Primary Sequences and Their Numerical Characterization J Chem Inf Comput Sci 2000 40 1235 44 Randic M Basak S C Characterization of DNA primary se quences based on the average distances between bases J Chem Inf Comput Sci 2001 41 3 561 8 Randic M Balaban A T On a four dimensional representation of DNA primary sequences J Chem Inf Comput Sci 2003 43 2 532 9 Bielinska Waz D Nowak W Waz P Nandy A Clark T Distribution Moments of 2D graphs as Descriptors of DNA Sequences
105. New York 2003 Haykin S Neural Networks A Comprehensive Foundation 2nd ed Prentice Hall New York 1998 Patterson D Artificial Neural Networks Prentice Hall Singapore 1996 Bryson A E Ho Y C Applied optimal control optimization estimation and control Blaisdell Publishing Company or Xerox College Publishing Waltham MA 1969 Haykin S Neural Networks A Comprehensive Foundation Mac millan Publishing New York 1994 Bishop C Neural Networks for Pattern Recognition University Press Oxford 1995 Vilar S Santana L Uriarte E Probabilistic neural network model for the in silico evaluation of anti HIV activity and mechanism of action J Med Chem 2006 49 3 1118 24 Ivanisenko V A Pintus S S Grigorovich D A Kolchanov N A PDBSite a database of the 3D structure of protein functional sites Nucleic Acids Res 2005 33 Database issue D183 7 Dobson P D Doig A J Distinguishing enzyme structures from non enzymes without alignments J Mol Biol 2003 330 4 771 83 1190 Journal of Proteome Research e Vol 9 No 2 2010 75 76 TI 78 19 80 81 82 83 84 85 86 87 88 Rodriguez Soca et al Ivanciuc O Weka machine learning for predicting the phospho lipidosis inducing potential Curr Top Med Chem 2008 8 18 1691 709 Ivanciuc O Drug Design with Machine Learning In En
106. O DE ASIENTO REGISTRAL 03 2009 1199 T tulo CULSPIN Compute ULam SPiral INdices Objeto de propiedad intelectual programa de ordenador Clase de obra programa de ordenador PRIMERA INSCRIPCI N Autorles y titularles originarios de derechos Apellidos y nombre P REZ MONTOTO L zaro Guillermo Nacionalidad CUB D N L N LF Pasaporte X5119731 T Apellidos y nombre PRADO PRADO Francisco Javier Nacionalidad ESP D N L N LF Pasaporte 44449687 W Apellidos y nombre GONZ LEZ D AZ Humberto Nacionalidad CUB D N L N LF Pasaporte X 6672910 N Apellidos y nombre MUNTEANU Cristian Robert Nacionalidad ROM D N L N I F Pasaporte X 4541639 J Datos de la solicitud N m solicitud SC 207 09 Fecha de presentaci n y efectos 24 06 2009 Hora 11 40 En Santiago de Compostela a cuatro de septiembre de dos mil nueve i A a yi t 5 ose M Guijo V zquez Z Q3 ESTRADA DA PROPEDADE TEL ECTUA i rate 56 2 2 Nuevos servidores online Bio AIMS basados en t cnicas de ingenier a inform tica e inteligencia artificial Ibero NBIC Network RNAS DIR TIC E AN TargetPred O Bio AIMS E Home Links About Modelling the reality Target Prediction applications for predicting the function of several targets such as proteins in human diseases or molecular proceses by using data such as protein sequences or blood proteome mass spectra d Trypano PPI Plasmod PPI EnzClassPred AT
107. PES e UNIVERSIDADE DA CORUNA Facultad de Inform tica Departamento de Tecnolog as de la Informaci n y las Comunicaciones T cnicas de ingenier a inform tica e inteligencia artificial para clasificaci n aplicaciones para el descubrimiento de f rmacos y dianas moleculares Tesis Doctoral Directores Alejandro Pazos Sierra Humberto Gonz lez D az Doctorando Cristian Robert Munteanu A Coru a Abril 2013 SEES UNIVERSIDADE DA CORU A Dr Alejandro Pazos Sierra Catedr tico de Universidad en el rea de Ciencias de la Computaci n e Inteligencia Artificial perteneciente al Departamento de Tecnolog as de la Informaci n y las Comunicaciones Facultad de Inform tica Universidade da Coru a Y Dr Humberto Gonz lez D az Prof Investigador Ikerbasque del Departamento de Qu mica Org nica IL Facultad de Ciencia y Tecnolog a Universidad del Pa s Vasco UPV EHU HACEN CONSTAR QUE La memoria T cnicas de ingenier a inform tica e inteligencia artificial para clasificaci n aplicaciones para el descubrimiento de f rmacos y dianas moleculares ha sido realizada por D Cristian Robert Munteanu bajo nuestra direcci n en el Departamento de Tecnolog as de la Informaci n y las Comunicaciones y constituye la Tesis que presenta para optar al Grado de Doctor en Inform tica de la Universidade da Coru a A Coru a 24 de Abril de 2013 Fdo Alejandro Pazos Sierra Fdo Humberto Gonz lez D a
108. PKKAKKAAGAKKAVKKTPKKAKKPAAAGVKKVAKS PKKAKAAAKPKKATKSPAKPKAVKPKAAKPKAAKPKAAKPKAAKAKKAAAKKK CULSPIN Transformation of sequences in Ulam spiral graph amp Case Fr A Fr D Fr E Fr E Fr G Fr 1 Fr K Fr L Fl M FFUN Fr P Fr Q FAR Fr S Fr T Fr V Fr Sh A Sh D Sh E Sh F Sh G Sh 1 Sh K Sh L Sh M Sh N Sh P Sh Q Sh R Sh S Sh T Sh V Sh Y HISTIH18 0 26275 0 00392 0 02745 0 00392 0 07451 0 00784 0 31569 0 03922 0 00196 0 015690 08235 0 00392 0 01176 0 05098 0 05490 0 039220 00392 0 152510 00944 0 04286 0 00944 0 084030 01651 0 158080 05516 0 00531 0 02831 0 089300 00944 0 022700 06590 0 06920 0 05516 0 00944 Calculation of graph topological indices TIs Frequencies Fr amp Shannon Entropies Sh Weka Methods Statistical Artificial intelligence NaiveBayes Logistic RBFNetwork analysis DTNB SMO SVM MLP HCC score f Fr Sh QPDR classification Model Evaluation if the protein is related with HCC New protein amino acid sequence Fig 1 Flowchart of building the QSDR classification models for HCC non HCC related proteins because the currently available data do not allow us to do so Otherwise the numbers of proteins for some subsets would be too few to have statistical significance Ulam spiral graphs In 1963 the mathematician Stanislaw M Ulam discovered certain interesting aspects in relation to the disposition that adopt the prime numbers when
109. Proteome Res 2010 9 1182 1190 C R M Yamilet Rodriguez Soca J Dorado J Rabu al A Pazos and H Gonzalez Diaz Polymer 2010 51 264 273 H Gonz lez D az A P rez Bello and E Uriarte Polymer 2005 46 6461 6473 L Saiz Urra H Gonz lez D az and E Uriarte Bioorg Med Chem 2005 13 3641 3647 H Gonz lez Diaz E Uriarte and R Ramos de Armas Bioorg Med Chem 2005 13 323 331 R Concu G Podda E Uriarte and H Gonzalez Diaz J Comput Chem 2009 30 1510 1520 H Gonzalez Diaz L Saiz Urra R Molina Y Gonzalez Diaz and A Sanchez Gonzalez J Comput Chem 2007 28 1042 1048 H Gonzalez Diaz R Molina and E Uriarte FEBS Lett 2005 579 4297 4301 R Concu G Podda E Uriarte and H Gonzalez Diaz J Comput Chem 2009 30 1510 1520 H Gonz lez D az Y P rez Castillo G Podda and E Uriarte J Comput Chem 2007 28 1990 1995 StatSoft Inc 6 0 edn 2002 A Speck Planche M T Scotti and V de Paulo Emerenciano Curr Pharm Des 2010 16 2656 2665 A Speck Planche and M N D S Cordeiro Curr Bioinf 2011 6 81 93 A Speck Planche M T Scotti V P Emerenciano A Garcia L pez E Molina P rez and E Uriarte J Comput Chem 2010 31 882 894 A Speck Planche M T Scotti A Garcia Lopez V P Emerenciano E Molina P rez and E Uriarte Mol Diversity 2009 13 445 458 A Speck Planche L Guilarte Montero R Yera Bueno J A
110. R Das and M Jett J Exp Ther Oncol 2004 4 91 100 R Hammamieh N Chakraborty R Das and M Jett J Exp Ther Oncol 2004 4 195 202 L McDermott M W Kennedy D P McManus J E Bradley A Cooper and J Storch Biochemistry 2002 41 6706 6713 G Zhu J Eukaryotic Microbiol 2004 51 381 388 G Greco E Novellino I Fiorini V Nacci G Campiani S M Ciani A Garofalo P Bernasconi and T Mennini J Med Chem 1994 37 4100 4108 M Tendler C A Brito M M Vilar N Serra Freire C M Diogo M S Almeida A C Delbem J F Da Silva W Savino R C Garratt N Katz and A S Simpson Proc Natl Acad Sci U S A 1996 93 269 273 F Liu S J Cui W Hu Z Feng Z Q Wang and Z G Han Mol Cell Proteomics 2009 8 1236 1251 L McDermott A Cooper and M W Kennedy Mol Cell Biochem 1999 192 69 75 L Kuang M L Colgrave N H Bagnall M R Knox M Qian and G Wijffels Mol Biochem Parasitol 2009 168 84 94 A Hirasawa T Hara S Katsuma T Adachi and G Tsujimoto Biol Pharm Bull 2008 31 1847 1851 J P Zbilut A Giuliani A Colosimo J C Mitchell M Colafranceschi N Marwan C L Webber Jr and V N Uversky J Proteome Res 2004 3 1243 1253 B Shen J Bai and M Vihinen Protein Eng Des Sel 2008 21 37 44 J Devillers and A T Balaban Topological Indices and Related Descriptors in QSAR and QSPR Gordon and Breach
111. RAQASTTTTHLKKVIAFY PTQIRN Y LNIDAIHPCEFIFPGFEPHFNVDELITN LSAKNNVRCLKTLYLHGFMNQQSQNFSEYGYQYFYKVIKTANSEAH Note en el caso de las prote nas si se selecciona la opci n Protein cada amino cido presente en la secuencia se codifica con una letra o clase diferente Para ello se tiene en cuenta el grupo al que pertenezca el amino cido seg n la polaridad y las propiedades cido base de sus cadenas laterales no polar y neutro polar y neutro cido y polar y b sico y 48 polar d Text or CSV files of MS data En esta opci n cada caso se encuentra almacenado en un fichero independiente En ellos los datos de las se ales del espectro est n organizados en dos columnas masa carga m z e Intensidad con encabezado o no Los ficheros pueden ser de tipo TXT o CSV Ficheros TXT las columnas est n separadas por tabulaci n 2 5660 0 6601 3 6601 8 9102 8 1024 42 0856 14 2856 22 2112 22 2112 3 8787 31 8787 4 3288 43 2881 56 4393 56 4393 71 3324 71 3324 87 9674 87 9674 90 0000 106 3443 12 1631 126 4631 8 3238 148 3238 100 9263 Ficheros CSV los elementos estan separados por comas m Z Intensity 2 5660 0 6601 3 6601 8 9102 8 1024 42 0856 14 2856 22 2112 22 2112 3 8787 31 8787 4 3288 43 2881 56 4393 56 4393 71 3324 71 3324 87 9674 87 9674 90 0000 106 3443 12 1631 126 4631 8 3238 148 3238 100 9263 II Classes for numerical sequences esta caja de controles s lo est activa si el formato de entrada seleccionado e
112. Rojas Vargas A Garcia Lopez E Uriarte and E Molina Perez Pest Manage Sci 2011 67 438 445 This journal is The Royal Society of Chemistry 2012 81 82 83 84 85 86 87 89 90 9 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 View Online A Speck Planche V V Kleandrova and J A Rojas Vargas Mol Diversity 2011 15 901 909 A Speck Planche V V Kleandrova F Luan and M N Cordeiro Bioorg Med Chem 2011 19 6239 6244 G M Casanola Martin M T Khan Y Marrero Ponce A Ather M N Sultankhodzhaev and F Torrens Bioorg Med Chem Lett 2006 16 324 330 G M Casanola Martin Y Marrero Ponce M T Khan A Ather K M Khan F Torrens and R Rotondo Eur J Med Chem 2007 42 1370 1381 G M Casanola Martin Y Marrero Ponce M T Khan A Ather S Sultan F Torrens and R Rotondo Bioorg Med Chem 2007 15 1483 1503 G M Casanola Martin Y Marrero Ponce M Tareq Hassan Khan F Torrens F Perez Gimenez and A Rescigno J Biomol Screening 2008 13 1014 1024 Y Marrero Ponce R Medina Marrero A E Castro R Ramos de Armas H Gonzalez Diaz V Romero Zaldivar and F Torrens Molecules 2004 9 1124 1147 R Ramos de Armas H Gonzalez Diaz R Molina and E Uriarte Proteins Struct Funct Genet 2004 56 715 723 R Ramos de
113. S H Ed An Introduction to Gerontology Cambridge University Press Cambridge UK pp 21 47 de Magalhaes J P Wuttke D Wood S H Plank M Vora C 2012 Genome environment interactions that modulate aging powerful targets for drug discovery Pharmacol Rev 64 88 101 Devillers J Balaban A T 1999 Topological Indices and Related Descriptors in QSAR and QSPR Gordon and Breach The Netherlands Diao Y Li M Feng Z Yin J Pan Y 2007 The community structure of human cellular signaling network J Theor Biol 247 608 615 Freitas A A de Magalh es J P 2012 A review and appraisal of the DNA damage theory of ageing Mutat Res 728 12 22 Freitas A A Vasieva O de Magalh es J P 2011 A data mining approach for classifying DNA repair genes into ageing related or non ageing related BMC Genomics 12 27 E Fern ndez Blanco et al Journal of Theoretical Biology 317 2013 331 337 337 Gomes N M Ryder O A Houck M L Charter SJ Walker W Forsyth N R Austad S N Venditti C Pagel M Shay J W Wright W E 2011 Comparative biology of mammalian telomeres hypotheses on ancestral states and the roles of telomeres in longevity determination Aging Cell 10 761 768 Gonz lez D az H Bonet I Ter n C de Clercq E Bello R Garc a M Santana L Uriarte E 2007a ANN QSAR model for selection of anticancer leads from structurally heterogeneous series of comp
114. SNet una cadena de caracteres Numbers to Sequence N meros en Secuencias Transforma los n meros delimitados por TAB en secuencias de caracteres las opciones son las siguientes O O O Parameters Par metros los valores m nimos y m ximos de los datos num ricos el n mero de grupos que necesitamos number of groups hasta un m ximo de 80 se puede utilizar el bot n GET para emplear los valores m nimos y m ximos calculados a partir de sus datos autom ticamente Input files Archivos de entrada archivo con datos como n meros Output files Archivos de salida archivo con secuencias archivo con grupos y archivo con intervalos de n meros la descripci n de los intervalos num ricos para cada grupo Nota esta funci n se puede utilizar para transformar los valores de un espectro de masa proteica en secuencias para poder calcular los ndices topol gicos del grafo de tipo estrella gt N to 1 Character Sequence transforma las secuencias donde la informaci n est codificada en N caracteres en secuencias tipo S2SNet basadas en cada car cter las opciones son las siguientes o Input files archivos de entrada con secuencias codificadas en N caracteres N character file archivo inicial archivo con la codificaci n code file la equivalencia entre N caracteres yl caracter ex ALA A o Output files archivos de salida para secuencias t picas a S2SNet character arch
115. Santiago de Compostela Spain Department of Information and Communication Technologies Computer Science Faculty University of A Coru a Campus de Elvi a 15071 A Coru a Spain ARTICLE INFO Article history Received 18 October 2009 Received in revised form 7 November 2009 Accepted 12 November 2009 Available online 26 November 2009 Keywords Protein Protein interactions Plasmodium proteome Protein 3D Electrostatic interactions 1 Introduction ABSTRACT We can define structural indices of polymer or biopolymer complex structures and use them in the prediction of new drug targets in parasites For instance Plasmodium falciparum causes the most severe form of Malaria and kills up to 2 7 million people annually whereas Plasmodium vivax is geographically the most widely distributed cause with more than 80 million clinical cases Due to drug resistance and toxicity discovering novel drug targets is mandatory such as Protein Protein Complexes unique in this pathogen and not present in human host pPPCs Additionally the 3D structure of an increasing number of Plasmodium proteins is being reported in public databases making easier the development of bio informatics models to predict pPPCs In addition some PPCs expressed both in parasite and human such as DHFR synthase play a significant role in drug resistance in both Malaria and Human Cancer However there are no general models to predict pPPCs using indices of
116. Slavova I Dobchev DA Kuanar M Fara DC et al Antimalarial activity a QSAR modeling using CODESSA PRO software Bioorg Med Chem 2006 14 7 2333 57 Katritzky AR Dobchev DA Tulp I Karelson M Carlson DA QSAR study of mosquito repellents using Codessa Pro Bioorg Med Chem Lett 2006 16 8 2306 11 Katritzky AR Kulshyn OV Stoyanova Slavova I Dobchev DA Kuanar M Fara DC et al Antimalarial activity a QSAR modeling using CODESSA PRO software Bioorganic amp medicinal chemistry 2006 14 7 2333 57 Prado Prado FJ Borges F Uriarte E Perez Montoto LG Gonzalez Diaz H Multi target spectral moment QSAR for antiviral drugs vs different viral species Anal Chim Acta 2009 651 2 159 64 Prado Prado FJ Martinez de la Vega O Uriarte E Ubeira FM Chou KC Gonzalez Diaz H Unified QSAR approach to antimicrobials 4 Multi target QSAR modeling and comparative multi distance study of the giant components of antiviral drug drug complex networks Bioorg Med Chem 2009 17 2 569 75 Prado Prado FJ Gonzalez Diaz H Santana L Uriarte E Unified QSAR approach to antimicrobials Part 2 predicting activity against more than 90 different species in order to halt antibacterial resistance Bioorg Med Chem 2007 15 2 897 902 173 75 76 77 78 79 80 81 82 83 84 85 86 Prado Prado FJ Uriarte E Borges F Gonzalez Diaz H Multi target spectral moments for QSAR and Complex Networks stud
117. Tables 3 and 4 These results show that Random Forest can still be considered adequate to solve the problem proposed in this work and that there is nearly no difference between using the X subset as input and all of the attributes Regarding classification scores this technique achieves 82 1 of correctly classified instances for the target class that is the antioxidant class with a precision of 80 4 considering the 12 attributes part of the X subset compared to 84 of correctly classified instances with a precision of 82 9 when all the attributes were considered that is 42 attributes Therefore it is very likely that some of these attributes may give little extra information Reducing the number of attributes considered as input may be interesting improving even the performance or precision of the model After analysing the results shown above it seems that Random Forest is the best and most robust classification model As it was previously mentioned the subsets Sh Tr and X contain the properties of the embedded and non embedded graph Therefore in order to try to reduce the number of input attributes the authors have tested the Random Forest in more depth distinguishing between the properties of both types of graph Results regarding this are shown in Table 5 as well as the number of attributes used as input to the method Table 4 Results obtained using combinations of the different subsets as input considering 20 attributes
118. The Netherlands 1999 F Torrens and G Castellano Curr Proteomics 2009 6 204 213 S Thomas and D Bonchev Hum Genomics 2010 4 353 360 D Bonchev S Thomas A Apte and L B Kier SAR QSAR Environ Res 2010 21 77 102 D Bonchev and G A Buck J Chem Inf Model 2007 47 909 917 L B Kier D Bonchev and G A Buck Chem Biodiversity 2005 2 233 243 D Bonchev and D H Rouvray Complexity in Chemistry Biology and Ecology Springer Science Business Media Inc New York 2005 D Bonchev Chem Biodiversity 2004 1 312 326 A Duardo Sanchez G Patlewicz and H Gonz lez D az Curr Bioinf 2011 6 53 70 P Riera Fern ndez C R Munteanu N Pedreira Souto R Mart n Romalde A Duardo Sanchez and H Gonz lez D az Curr Bioinf 2011 6 94 121 H Gonzalez Diaz Curr Pharm Des 2010 16 2598 2600 H Gonzalez Diaz F Romaris A Duardo Sanchez L G Perez Montoto F Prado Prado G Patlewicz and F M Ubeira Curr Pharm Des 2010 16 2737 2764 R Concu G Podda F M Ubeira and H Gonzalez Diaz Curr Pharm Des 2010 16 2710 2723 J Chen and B Shen Curr Proteomics 2009 6 228 234 H B Shen and K C Chou Anal Biochem 2008 373 386 388 H B Shen and K C Chou Protein Eng Des Sel 2007 20 561 567 K C Chou and H B Shen Biochem Biophys Res Commun 2007 360 339 345 C Chou and H B Shen Nat Protocols 2008 3 153 162
119. a actividad biol gica de los compuestos qu micos 111 llevar a cabo un an lisis de agrupamiento en cl steres de datos experimentales y descriptores moleculares iv interpretar los modelos desarrollados y v predecir los valores de propiedad de cualquier compuesto qu mico con una estructura molecular conocida CODESSA PRO incluye 116 descriptores moleculares divididos en 8 grupos constitucionales topol gicos y geom tricos CPSA electrost ticos cu nticos qu micos relacionados con las orbitales moleculares y la termodin mica Algunos ejemplos del uso de este programa de investigaci n est n en 68 71 10 gt CODESSA PRO File Edit view Lists Dimensions Calculate Option Window Help Menu Bar asada Eres aaa OG C Tool Bar Workspace x A Property Property Window Log Window Ide Status Bar NM Z7 Figura 8 La interfaz visual de la aplicaci n CONDESSA PRO 11 1 2 Modelos de inteligencia artificial para f rmacos y dianas moleculares La b squeda experimental de nuevos f rmacos y dianas moleculares para luchar contra los microbios y par sitos implica un esfuerzo financiero y humano Por esta raz n los cient ficos necesitan unos m todos te ricos extremamente r pidos y baratos para predecir actividades biol gicas de nuevos posibles f rmacos o proponer posibles dianas moleculares Por eso se utilizan como modalidad inicial de screeni
120. a derecha con dof En la Figura 17 se presenta el caso de los grafos embedded los resultados se modifican incluyendo en los c lculos la conectividad inicial dentro de la secuencia grafo situado a la izquierda con twopi grafo situado a la derecha con circo 3 results txt Bloc de notas arco Archivo Edici n Formato Ver Ayuda Pos Chain Seq Sho shi sh2 Sha sh4 shs a Tro Tri Tr2 Tr3 Tr4 Trs H w se s J xo X1 R X2 x3 x4 X5 7ODC A SSFTKDEFDCHILDEGFTAKDILDQKINDKDAFYVADLGDILKKHLRWLKAL PRVTPFYAVKCND SRA IVSTLAAIGTGFDCASKTEIQLVQGLGVPAERVIYANPCKQVSQIKYAASNGVQMMTFD SEIELMKVA RAHPKAKLVLRIATKFGATLKTSRLLLERAKELNIDVIGVSFHVGSGCTDPDTFVQAVSDARCVFDMA TEVGFSMHLLDIGGGFPGSEDTKLKFEEITSVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNI IAKKTVWEQTFMYYVNDGVYGSFNCILYDHAHVKALLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCN LPEMHVGDWMLFENMGAYTVAAASTFNGFQRPNIYYVMSRPMWQLMK 5 91683715007 6 80911581588 6 41546759481 6 87788629113 6 55167891758 6 90893894584 388 0 0 194 5 0 0 146 125 0 0 1569 1127847 9926318 0 13322 3405011 39747914 0 7680365149 45 279 731795493 190 804413284 127 848026702 85 4022066419 56 8529455389 37 6656998544 36 Figura 16 Ejemplo de resultados non embedded con la S2SNet ndices topol gicos y dibujos de los grafos de tipo estrella C results tet Bloc de notas eS T lt Chain seq sho shi sh2 sh3 Sh4 Tr2 Tr3 Tr4 Trs H w se J xo X1 R x2 x3 X4 xs A SSFTKDEFDCHILDEGFTAKD ILDQKINDKDAFYVADL GD ILKKHLRWLKAL PRVTPFYAVK
121. a tesis se presentan en la secci n 2 2 Nuevos servidores online del Bio AIMS basados en t cnicas de ingenier a inform tica e inteligencia artificial una colecci n de 7 implementaciones de modelos QSAR para f rmacos y prote nas con aplicaciones para al menos 9 tipos de microbios y par sitos Ascaris Entamoeba Fasciola Giardia Leishmania Plasmodium Trichomonas Tripanosoma y Toxoplasma 23 1 4 Objetivos Disefio aplicaciones desarrollar nuevas herramientas inform ticas programas de ordenador con t cnicas de ingenier a inform tica para el c lculo de TIs de utilidad en el desarrollo de modelos QSAR Modelos QSAR QSPR encontrar nuevos modelos QSAR QSPR con t cnicas de inteligencia artificial aplicables a la predicci n de la actividad biol gica de compuestos de inter s en Qu mica Farmac utica Microbiolog a y Parasitolog a empleando los nuevos programas desarrollados Dise o servidores online implementar los cuatros modelos QSAR QSPR encontrados en nuevas herramientas inform ticas de uso en la red servidores web para la predicci n online de f rmacos y dianas moleculares en Qu mica Farmac utica Microbiolog a y Parasitolog a Publicaciones protecci n de la propiedad intelectual registros de software comunicaci n publicaci n de art culos libros cap tulos etc y aplicaci n de las herramientas desarrolladas 24 2 RESULTADOS Y DISCUSI N En esta secci n se presentar n todos
122. acid synthetic enzymes in Plasmodium sp Toxoplasma sp and Eimeria sp apicomplexans are absent in C parvum suggesting that this parasite is unable to synthesize fatty acids de novo However C parvum possesses other important LIBPs enzymes involved in fatty acid metabolism In addition molecular cloning of components of protective antigenic preparations has suggested that related parasite LIBPs could form the basis of the protec tive immune cross reactivity between the parasitic trematode worms Fasciola hepatica and Schistosoma mansoni Tendler and Brito discussed that these results suggest a single vaccine effective against at least two parasites F hepatica and S mansoni of veterinary and human importance respectively In fact schisto somes are the causative agents of schistosomiasis one of the most prevalent and serious parasitic diseases that currently affects approximately 200 million people worldwide Schisto some excretory secretory ES proteins have been shown to play important roles in modulating mammalian host immune systems In parallel Liu et al performed a global proteomics identification of the ES proteins from adult worms of Schistosoma japonicum one of the three major schistosome species They revealed that LIBPs are major constituents of the in vitro ES proteome Actually in the 19905 WHO TDR created a product development programme and initiated collaborations with other major international donors to
123. actions Mol Biotechnol 2008 38 1 1 17 Najafabadi H S Salavati R Sequence based prediction of protein protein interactions by means of codon usage Genome Biol 2008 9 5 R87 Kim S Shin S Y Lee I H Kim S J Sriram R Zhang B T PIE an online prediction system for protein protein interactions from text Nucleic Acids Res 2008 36 Web Server issue W411 5 Jaeger S Gaudan S Leser U Rebholz Schuhmann D Integrat ing protein protein interactions and text mining for protein function prediction BMC Bioinf 2008 9 Suppl 8 S2 Burger L van Nimwegen E Accurate prediction of protein protein interactions from sequence alignments using a Bayesian method Mol Syst Biol 2008 4 165 Scott M S Barton G J Probabilistic prediction and ranking of human protein protein interactions BMC Bioinf 2007 8 239 Ivanciuc O Schein C H Braun W Data mining of sequences and 3D structures of allergenic proteins Bioinformatics 2002 18 10 1358 64 Fern ndez M Caballero J Fern ndez L Abreu J I Garriga M Protein radial distribution function P RDF and Bayesian Regularized Genetic Neural Networks for modeling protein con formational stability Chymotrypsin inhibitor 2 mutants J Mol Graph Model 2007 26 4 748 759 Fern ndez L Caballero J Abreu J I Fern ndez M Amino Acid Sequence Autocorrelation Vectors and Bayesian Regularized Ge netic Neural Networks fo
124. ame and the chain label no emptry new line the results will print the pairs between the chain from list 1 with the chain from list 2 not the combination of the list items PFLASMODILM AMT TRUE Classification Tree 3CSIA 3CSIE 2F6IE 2GHUA A _ dA 1SYRC 1SYRE Test Accuracy 96 8 Plasmod PPI Plasmodium Protein Protein Interactions PPPI Tool MARCH INSIDE Python version Data RCSB PDB Predict Figura 31 Herramienta online PlasmodPPI Podemos definir los indices estructurales de pol meros o biopol meros complejos y usarlos en la predicci n de nuevos f rmacos y sus correspondientes dianas en los par sitos Por ejemplo el Plasmodium falciparum produce la forma m s severa de malaria y mata hasta 2 7 millones de personas anualmente mientras que Plasmodium vivax es geogr ficamente la causa con m s distribuci n con m s de 80 millones de casos cl nicos Debido a la farmacorresistencia y la toxicidad el descubrimiento de nuevas dianas de f rmacos es obligatorio tales como los complejos prote na prote na nicos de este pat geno pero no en el hu sped humano pPPCs 62 Adem s la estructura 3D de un n mero creciente de prote nas de Plasmodium se est introduciendo en las bases de datos p blicas facilitando el desarrollo de modelos bioinform ticos para predecir pPPCs Adem s algunos PPCs se expresan en los par sitos y en los humanos tales como la DHFR sintetasa juegan un papel importante en
125. amic Markov model Bioorg Med Chem 2005 13 4 1119 29 Gonzalez Diaz H Molina R Uriarte E Recognition of stable protein mutants with 3D stochastic average electrostatic potentials FEBS Lett 2005 579 20 4297 301 Gonz lez D az H P rez Bello A Uriarte E Stochastic molecular descriptors for polymers 3 Markov electrostatic moments as polymer 2D folding descriptors RNA QSAR for mycobacterial promoters Polymer 2005 46 6461 73 Freund J A Poschel T Stochastic Processes in Physics Chem istry and Biology In Lecture Notes in Physics Springer Verlag Berlin Germany 2000 Gonz lez D az H Uriarte E Ramos de Armas R Predicting stability of Arc repressor mutants with protein stochastic moments Bioorg Med Chem 2005 13 2 323 31 Gasmi G Singer A Forman Kay J Sarkar B NMR structure of neuromedin C a neurotransmitter with an amino terminal Cull Nill binding ATCUN motif J Pept Res 1997 49 6 500 9 Gokhale N H Cowan J A Inactivation of human angiotensin converting enzyme by copper peptide complexes containing ATCUN motifs Chem Commun Camb 2005 47 5916 8 Robertson L S Iwanowicz L R Marranca J M Identification of centrarchid hepcidins and evidence that 17beta estradiol disrupts constitutive expression of hepcidin 1 and inducible expression of hepcidin 2 in largemouth bass Micropterus salmo ides Fish Shellfish Immunol 2009 26 6 898 907 Saiz Urra
126. and Drug Targets in Parasites and Bacteria Molecular BioSystems 8 3 851 862 2012 Humberto Gonz lez D az Cristian R Munteanu Lucian Postelnicu Francisco Prado Prado Marcos Gestal Alejandro Pazos Enlace http goo gl c TNcP Herramienta http bio aims udc es LIBPpred php puter Science Faculty LIBPpred Q Bio AIMS Modelling the reality Home Links About Ibero NBIC Network Mode 1 Standard PDBs PDB PDB chain List Please paste the ID of the PDBs PDB chains as a list maximum 10 items 1QGHK 114M 2QZTB 1BOU LIBPpred LIpid Binding Proteins Prediction Tool MARCH INSIDE Python version Predict Data RCSB PDB LDA classification model Mode 2 LOMETS PDB Accuracy 89 11 the model is based on 9 spectral moments of the proteins Upload amp evaluate one PDB from LOMETS max 2MB Note The LIBP prediction is calculated using Please select LOMETS PDB LIBPscore Min score 100 Max score Min score Seleccionar archivo No se ha seleccionado ning n archivo Predict where LIBPscore is the result of the LDA equation for the current protein and Min and Max score are the minimum and maximum values of the LIBPscore for our dataset Figura 34 Herramienta online LIBPpred Las prote nas que se unen a l pidos Lipid Binding Proteins LIBPs o prote nas de uni n a los cidos grasos Fatty Acid Binding Proteins FABPs juegan un papel importante en muchas enfermedades
127. anti virales anti bacterianos anti parasitarios y anti f ngicos A continuaci n se presentar n unos ejemplos de herramientas Web basados en modelos de inteligencia artificial Despu s de la revelaci n de los objetivos de esta tesis comienza la segunda secci n la de los RESULTADOS Y DISCUSION dividida a su vez en tres partes nuevos programas inform ticos para el c lculo de los descriptores moleculares nuevas herramientas online en el Bio AIMS que se basan en modelos de clasificaci n QSAR y la presentaci n de trabajos de revisi n y los cap tulos de libros dedicados a las aplicaciones de grafos en las ciencias Bio 4 Todas las partes de esta secci n contienen el sumario de las publicaciones correspondientes La tesis continua con las CONCLUSIONES las REFERENCIAS en el texto hasta ese punto y una secci n que incluye las seis PUBLICACIONES ANEXOS con ndice de impacto JCR en el lenguaje original que corresponden a los sumarios presentados anteriormente en la parte de los RESULTADOS Y DISCUSIONES 1 1 Programas para par metros de grafos moleculares Muchos fen menos pueden ser modelados como una red compleja Por eso la teor a de redes se puede utilizar en los estudios sobre el descubrimiento de f rmacos las v as metab licas enfermedades busqueda de dianas moleculares interacciones entre macromolecules etc En esta tesis vamos a centrarnos tanto en los sistemas moleculares tales como los f rmacos y las prote nas
128. ara extender la validaci n del modelo para la construcci n de la red La red prevista tiene 59 nodos compuestos 648 aristas pares de compuestos con actividad similar baja densidad de cobertura d 37 896 y una distribuci n m s cercana a un valor normal que a uno exponencial La ecuaci n del modelo es la siguiente Actv 0 49 7 S C ao 2 57 m S X 143 m S H Het 0 90 R 0 75 A 2044 p 0 001 9 donde Rc es el coeficiente de correlaci n can nica A es la estad stica de Wilk y p el nivel de error En esta ecuaci n las probabilidades absolutas Ar calculadas se refieren a 1 ns s Csp amp sp2 todos los tomos de carbono insaturados tomos sp y sp2 y todos los tomos colocados a una distancia de cinco o menos tomos de ellos 2 no s X todos los tomos de hal genos 3 zo s H Het todos los tomos de hidr geno unidos a un hetero tomo N O o S Prado Prado et al 81 han utilizado la teor a de las Cadenas de Markov para calcular nuevos momentos espectrales para multiples dianas con el fin de ajustar un modelo mt QSAR 18 que predice la actividad antif ngica de m s de 280 medicamentos contra 90 especies de hongos El LDA se utiliz para clasificar los medicamentos como activos o inactivos contra especies de hongos diferentes El modelo clasific correctamente 12434 de los 12566 compuestos inactivos 98 95 y 421 de los 468 compuestos activos 89 96 La predictibilidad tota
129. art 1 prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor Bioorg Med Chem 2005 13 8 3003 15 Marrero Ponce Y Medina Marrero R Castro A E Ramos de Armas R Gonz lez D az H Romero Zaldivar V Torrens F Protein Quadratic Indices of the Macromolecular Pseudograph s a Carbon Atom Adjacency Matrix 1 Prediction of Arc Repressor Alanine mutant s Stability Molecules 2004 9 1124 1147 Estrada E Uriarte E Vilar S Effect of Protein Backbone Folding on the Stability of Protein Ligand Complexes J Proteome Res 2006 5 105 111 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 technical notes Ivanciuc O Braun W Robust quantitative modeling of peptide binding affinities for MHC molecules using physical chemical descriptors Protein Pept Lett 2007 14 9 903 16 Ivanciuc O Oezguen N Mathura V S Schein C H Xu Y Braun W Using property based sequence motifs and 3D modeling to determine structure and functional regions of proteins Curr Med Chem 2004 11 5 583 93 von Grotthuss M Plewczynski D Ginalski K Rychlewski L Shakhnovich E I PDB UF database of predicted enzymatic functions for unannotated protein structures from structural genomics
130. art type FABP H FABP and the role of sterol regulatory element binding protein 1c in the regulation of fatty acid synthase expression in breast cancer Li and Huang et al carried out a prognostic evaluation of epidermal FABP and calcyphosine two proteins implicated in endometrial cancer using a proteomic approach Cutaneous FABP C FABP expressed in prostate cancer is a potential prognostic marker and target for tumourigenicity suppression and Adipocyte FABP A FABP P induces apoptosis in DU145 prostate cancer cells In addition Hammamieh et al evaluated in vitro molecular impacts of antisense complementary to the FABP mRNA in DU145 prostate cancer cells On the other hand LIBPs or FABPs are also very important in parasites Three different classes of small LBPs are found in helminth parasites The parasites that produce these proteins are unable to synthesize their own complex lipids and instead rely entirely upon their hosts for supply Zhu has reviewed fatty acid metabolism in Cryptosporidium parvum which is one of the apicomplexans that can cause severe diarrhea in humans and animals The slow development of anti cryptosporidiosis chemotherapy is primarily due to the poor understanding of the basic metabolic pathways in this parasite Many well defined or promising drug targets found in other apicomplexans are either absent or highly divergent in C parvum The recently discovered apicoplast and its associated Type II fatty
131. ase 41 3 for hydrolases 39 2 for transferases 34 5 for cell adhesion proteins 33 5 for metal binders 25 0 for translation proteins 16 7 for transporters 9 1 of the structural proteins and 8 2 for isomerases From among Journal of Proteome Research e Vol 8 No 11 2009 5225 research articles these candidates several chains are pointed out 2FFL chains A B C D as a specialized ribonuclease Dicer that initiates RNA interference by cleaving double stranded RNA sub strates 2A0U chains A and B as a translation initial factor in Leishmania major 2112 chain A 3CHJ chain A 3CHL chain A as a member of the alpha giardin family of annexins localized to the flagella of the intestinal protozoan parasite Giardia lamblia and 3CS1 chain A as the flagellar calcium binding protein FCaBP of the protozoan Trypanosoma cruzi In addition a protein with unknown biological function is predicted to have DNA cleavage activity 1N81 186 amino acids Plasmodium falciparum For more detailed information Table 2 presents the top ten of the best predicted ATCUN proteins in eight important parasites We can observe different protein functions of the predicted protein chains such as oxidoreductase for Ascaris suum and Entamoeba histolytica transferase for Toxoplasma gondii Trypanosoma brucei and Fasciola hepatica hydrolase for Giardia intestinalis and Leish mania major and lyase for Plasmodium falciparum In general
132. asi A L Oltvai Z N Lethality and centrality in protein networks Nature 2001 411 6833 41 2 2 3 10 11 Trypano PPI 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Carmi S Levanon E Y Havlin S Eisenberg E Connectivity and expression in protein networks proteins in a complex are uniformly expressed Phys Rev E Stat Nonlin Soft Matter Phys 2006 73 3 Pt 1 031909 Bornholdt S Schuster H G Handbook of Graphs and Complex Networks From the Genome to the Internet WILEY VCH GmbH amp CO KGa Weinheim 2003 Estrada E Protein bipartivity and essentiality in the yeast protein protein interaction network J Proteome Res 2006 5 9 2177 84 Estrada E Virtual identification of essential proteins within the protein interaction network of yeast Proteomics 2006 6 1 35 40 Sharon L Davis J V Yona G Prediction of protein protein interactions a study of the co evolution model Methods Mol Biol 2009 541 61 88 Liu L Cai Y Lu W Feng K Peng C Niu B Prediction of protein protein interactions based on PseAA composition and hybrid feature selection Biochem Biophys Res Commun 2009 380 2 318 22 Skrabanek L Saini H K Bader G D Enright A J Computa tional prediction of protein protein inter
133. at maps input data onto a set of appropriate outputs It consists of multiple layers of nodes in a directed graph with each layer fully connected to the next one Except for the input nodes each node is a neuron also known as processing element with a nonlinear activation function This ANN uses a supervised learning technique called back propagation in order to train the network As well as the MLP Support Machine Vectors SVM are nonlinear classifiers SVM induce linear separators or hyperplanes in the space of charac teristics This type of classifier has proved to be very useful when dealing with high dimensionality problems Bayesian methods have also been applied to this type of problem These methods are based on Bayes theory of probability Not only they allow performing classification but they also allow finding relation ships among attributes Among them we can find Naive Bayes which assumes that the attributes are independent Finally DTNB allows obtaining classification models based on IF THEN ELSE rules or on hierarchical structures such as trees Among the independent dataset test sub sampling or k fold e g 5 or 10 fold cross over test and jackknife test which are often used for examining the accuracy of a statistical prediction method the jackknife test was deemed the least arbitrary that can always yield a unique result for a given benchmark dataset as elucidated and demonstrated in ref 78 Therefore the
134. at the new type of parameters introduced herein is useful to numerically characterize the struc ture of PPCs formed after PPIs in protein structure function studies We also demonstrate that it is possible to distinguish between PPCs pPPCs cases formed according to unique PPIs in Plasmodium sp pPPIs and not present in other parasites or host organisms using these parameters We generate and compare linear and non linear classifiers We show that it is possible to predict PPIs that undergo pPPC formation with a simple linear classifier based on the absolute difference between 3D protein surface electrostatic entropies of the pair proteins The model was implemented in a public web server available for free of charge use to the scientific community Acknowledgments We thank the kind and professional attention of Prof J E Mark Computational amp Theoretical Polymer Science editor for Polymer as well as the opinion of the reviewers Gonzalez D az H and Munteanu C R acknowledge research contract financed by the Contract grant sponsor Isidro Parga Pondal Program Xunta de Galicia The authors thank for the partial financial support from the grants 2007 127 and 2007 144 from the General Directorate of Scientific and Technological Promotion of the Galician University System of the Xunta de Galicia and from grant Ref PIO52048 and RD07 0067 0005 funded by the Carlos III Health Institute Appendix Supplementary data Supplementary
135. ation only between neighbouring aa vj 1 if Tij lt Teut on Otherwise the interaction is banished o 0 The relationship oi may be visualized in the form of a protein structure complex network see Fig 2 In this network the nodes are the C atoms of the amino acids and the edges connect pairs of amino acids with oj 1 Euclidean 3D space r3 x y Z coordinates of the C atoms of amino acids are listed in protein PDB files For calculation all water molecules and metal ions were removed All calculations were carried out with our in house MARCH INSIDE 2 0 software For calculation the MARCH INSIDE software never uses the full matrix never a sub matrix but may run the last summation term either for all amino acids or only for some specific groups called regions or orbitals R These regions are often defined in geometric terms and called core inner middle or surface region The protein is virtually divided into the following regions c corresponds to core i to inner m to middle and s to surface regions respectively The diameters of the regions 854 Mol BioSyst 2012 8 851 862 View Online Fig 2 Representations of a LIBP with PDB ID 1ZHG an FABP from P falciparum A 3D structure model for full complex and B complex network graph for chain A as a percentage of the longest distance max with respect to the centre of charge are 0 to 25 for region c 25 to 50 for region i 50 to 75 for region m and
136. ation series carried out with an external series of pPPI and npPPI that were never used to train the model Interestingly four variables 93 m da s At and 9 lt t out of more than 30 parameters calculated appear in many models These parameters have the general formula dg R O R proti R prot2 which are the absolute difference between the electrostatic entropy values 0x R for amino Table 1 Summary of results for LDA CT and ANN analysis Technique Training sub set Validation sub set Profile Parameters Group npPPI pPPI npPPI pPPI LDA Specificity npPPI 85 0 2886 509 824 897 191 Forward Sensitivity pPPI 948 30 551 92 7 14 179 stepwise Accuracy Total 864 840 er Specificity npPPI 98 5 3343 52 98 0 1066 22 LC Sensitivity pPPI 91 2 51 530 902 19 174 Accuracy Total 97 4 968 CT Specificity npPPI 95 6 3247 148 965 1050 38 US Sensitivity pPPI 83 8 94 487 845 30 163 Accuracy Total 93 9 947 CRT Specificity npPPI 97 6 3315 80 97 8 1064 24 Gini measure Sensitivity pPPI 84 7 89 492 834 32 161 Accuracy Total 957 956 CRT Specificity npPPI 97 6 3315 80 97 8 1064 24 Chi square Sensitivity pPPI 84 7 89 492 834 32 161 Accuracy Total 957 956 CRT Specificity npPPI 98 6 3348 47 98 4 1071 17 G square Sensitivity pPPI 81 8 106 475 80 3 38 155 Accuracy Total 962 957 MLP Sensitivity pPPI 83 3 484 97 82 9 160 33 4 4 7 1 1 Specificity npPPI 84 0 544 2851 829 186 902 Acc
137. based classification of amino acid mutations Protein Eng Des Sel 2008 21 1 37 44 Krishnan A Zbilut J P Tomita M Giuliani A Proteins as networks usefulness of graph theory in protein science Curr Protein Pept Sci 2008 9 1 28 38 Krishnan A Giuliani A Zbilut J P Tomita M Implications from a network based topological analysis of ubiquitin unfolding simulations PLoS ONE 2008 3 5 e2149 Palumbo M C Colosimo A Giuliani A Farina L Essentiality is an emergent property of metabolic network wiring FEBS Lett 2007 581 13 2485 9 Krishnan A Giuliani A Tomita M Indeterminacy of reverse engineering of Gene Regulatory Networks the curse of gene elasticity PLoS ONE 2007 2 6 e562 Tun K Dhar P K Palumbo M C Giuliani A Metabolic pathways variability and sequence networks comparisons BMC Bioinformatics 2006 7 24 Nandy A Ghosh A Nandy P Numerical characterization of protein sequences and application to voltage gated sodium chan nel a subunit phylogeny In Silico Biol 2009 9 8 Parasite Protein ATCUN DNA Cleavage Model 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Randic M Vracko M Nandy A Basak S C On 3 D graphical representation of D
138. biol gica de m s de 70 medicamentos de la literatura contra 96 especies de bacterias Se ha aplicado el LDA para clasificar los medicamentos como activos o inactivos contra diferentes especies bacterianas analizadas El modelo clasific correctamente 199 de los 237 compuestos activos 83 9 y 168 de los 200 compuestos inactivos 84 La predictibilidad total en el grupo de entrenamiento fue del 84 367 de los 437 casos La validaci n del modelo se llev a cabo utilizando la serie de predicci n externa clasific ndose correctamente 202 de los 243 83 13 casos Con el fin de mostrar c mo funciona el modelo en la pr ctica se llev a cabo un screening virtual el modelo reconociendo como activos 480 de los 568 84 5 compuestos antibacterianos que no se utilizaron en las series de entrenamiento o predicci n La ecuaci n del modelo es la siguiente Actv 1 12 C T 1 342C T 1 84 C Ca 0 90 C C 0 887 C X 1 27 C H Het 0 90 C H Het 0 698 420 49 Rce 0 715 p 0 001 3 donde A es la estad stica de Wilk Rc es la correlaci n can nica y p el nivel de error En la ecuaci n C es el ndice molecular de una cierta especie despu s de k etapas Se ha calculado para el total T de tomos en la mol cula o para asociaciones espec ficas de tomos Otro modelo propuesto por Prado Prado et al 75 clasific correctamente 202 de los 241 compuestos activos 83 8 y 169 de los 200 casos ina
139. c entropy indices the corresponding observed classification and the predicted classification for each pPPI or npPPI pair are given in the Supporting information 3 Results and discussion Several researchers have demonstrated the high performance of different types of computational classifiers in protein or PPI structure function relationship studies based on different algo rithms as is the case for instance of the works carried out by Chou Y Rodriguez Soca et al Polymer 51 2010 264 273 269 et al 84 90 Fernandez and Caballero 91 93 In particular the LDA algorithm a simpler type of the classifier used herein was employed to train linear models based on different combinations of parameters 94 3 1 Linear discriminant analysis LDA models A simple Linear Discriminant Analysis LDA with only four variables was developed to assign each protein pair as pPPI or npPPI The best equation found was S pPPC 0 09506 463 m 0 02219 40 s 0 62697 405 t 0 51126 40 t 0 30646 N 3976 xy 947 95 p lt 0 00 8 The statistical parameters for the above equation are Number of protein entries in training N Chi square statistic x and error level p level which have to be lt 0 05 95 All the statistical data of this model are summed up in Table 1 The discriminant function reported in the results section presented statistically significant results of goodness of fit for both training and valid
140. c matrices IL built up as a squared matrices n x n where n is the number of aa in the protein The subscript e points to the electrostatic type of molecular force field The method considers a hypothetical situation in which every j aa has general potential amp isolated in the space All these potentials can be listed as elements of the vector ps It can be supposed that after this initial situation all the amino acids interact with the energy Ej with every other aa in the protein For the sake of simplicity a truncation function ay is applied in such a way that a short term interaction takes place in a first approximation only between neighboring amino acids a 1 if dj lt cutoff distance Otherwise the interaction is banished aj 0 Neglecting direct interactions between distant aa in II does not avoid the possibility that potential interactions propagate between those aa within the protein backbone in an indirect manner Consequently in the present model long range electrostatic interactions are al lowed not forbidden but estimated indirectly using the natural powers of e II The use of MCM theory allows a simple and fast model to calculate the average values of amp considering indirect interaction between any aa and the other aa after previous interaction of aa with other k neighbor amino acids As follows we give the general formula for any potential and specific formulas as well 1184 Journal of P
141. called Orbits or Regions R These regions are often defined in geometric terms and called core inner middle or surface region In Fig 3 we represented the orbits of protein c corresponds to core i to inner m to middle and s to surface orbits respectively The diameters of the orbits are O lt orbit c lt 25 25 lt orbit i lt 50 50 lt orbit m 75 and 76 lt orbit s lt 100 expressed in terms of percentage of the longest distance rmax with respect to the center of charge Additionally we take into consideration the total orbit t that contains all the amino acids in the protein orbit diameter 0 100 of rmax Consequently we can calculate different 0x R for the 268 Y Rodriguez Soca et al Polymer 51 2010 264 273 Fig 3 Flowchart for all the steps given in the construction of the classifiers and server amino acids contained in an orbit c i m s or t and placed at a topological distance k within this orbit k is the order named 72 75 In this work we calculated altogether 5 types of region S x 6 orders considered 30 0 R indices for each protein In order to carry out the calculations referred to in equation 1 for any kind of entropy and detailed in the previous equations for electrostatic entropy the elements pj of II and the absolute initial probabilities pi j were calculated as follows 67 aj EE lp aij Eij 7 dj 5 y 941 a E 1 y didm m 1 im im m 1
142. cos las interacciones f sicas las v as metab licas la acci n farmacol gica la recurrencia de la ley o el comportamiento social 54 Para el estudio cuantitativo las redes complejas se pueden caracterizar num ricamente por par metros nicos de la red habitualmente conocidos como ndices topol gicos TIs Los TIs de redes conocidas moleculares o no se utilizan como entradas en el an lisis estad stico para construir modelos tipo QSAR QSPR En este sentido se han desarrollado distintos programas para el c lculo de estos par metros En consecuencia se pueden definir los siguientes elementos en la teor a de las redes complejas que se utilizar n a lo largo de toda la tesis red un grupo interconectado o sistema de elementos que comparte informaci n grafo representaci n simb lica de una red y de su conectividad implica una abstracci n de la realidad por la que se puede simplificar como un conjunto de nodos v rtices conectados por l neas aristas que representan las relaciones propiedades comunes ndices topol gicos cualquier par metro num rico invariante de un grafo que caracteriza su topolog a geometr a estructura codifican la informaci n sobre las funciones de la red real El esquema general del trabajo con t cnicas QSAR y la teor a de las redes complejas esta presentado en Figura 1 gt las mol culas de prote nas o f rmacos redes reales de amino cidos y tomos est n transformados en
143. ctivos 84 5 La predictibilidad total en la serie de entrenamiento fue 84 13 371 de los 441 casos La validaci n del modelo se llev a cabo utilizando la serie de predicci n externa clasific ndose correctamente 197 de los 221 89 4 casos La ecuaci n del modelo es la siguiente Actv 3 5 z Ca 3 Cy 1 76 7 C 71 77 m Het 2 54 7 H Het 2 4 m Het Het 5 42 zz H Het 0 74 sat uns 0 49 Rc 0 718 p lt 0 001 4 14 donde A es la estad stica de Wilk Rc el ndice can nico y p el nivel de error En la ecuaci n ak es el momento espectral de una cierta especie despu s de k etapas Se ha calculado para el total T de tomos en la mol cula o para asociaciones espec ficas de tomos Los resultados de este modelo QSAR fueron utilizados como entradas para la construcci n de una red Esta red observada tiene 1242 nodos medicamentos y bacterias 772736 aristas pares medicamento bacteria con una actividad similar La red prevista tiene 1031 nodos y 641377 aristas Despu s de una comparaci n de arista a arista se ha demostrado que la red prevista es significativamente similar a la observada y ambas tienen una distribuci n m s cercana al exponencial que al normal 1 2 3 Modelos de clasificaci n para compuestos anti parasitarios Prado Prado et al 76 han propuesto un mt QSAR para m s de 500 f rmacos analizados en la literatura contra diferentes par sitos Los datos fueron
144. cuales 324 son prote nas antioxidantes Con estos datos como entrada los ndices topol gicos de los gr fos estrella se calcularon con la herramienta S2SNet Estos ndices se utilizan luego como entrada en varias t cnicas de clasificaci n Entre las t cnicas utilizadas el Random Forest ha mostrado el mejor rendimiento logrando una puntuaci n de 94 de casos totales correctamente clasificados El modelo propuesto es capaz de alcanzar un porcentaje de 81 8 de casos clasificados correctamente para el grupo de las prote nas antioxidantes con una precisi n del 81 396 31 2 1 2 2 Manual del programa S2SNet Lenguaje de S2SNet La S2SNet Sequence to Star Network es una aplicaci n gratuita en el campo de las redes complejas matem ticas aplicadas programada en el lenguaje Python utilizando el wxPython para crear el entorno gr fico y los ejecutables del Graphviz para dibujar los grafos http www graphviz org La ayuda est presentada como una p gina de HTML La S2SNet funciona en el sistema operativo Microsoft XP Vista Para editar los archivos de c lculos se utiliza el editor Bloc de Notas Nota en los dos casos se necesita la instalaci n previa del Graphviz para la visualizaci n de los grafos La S2SNet aplicaci n para estudios de redes complejas Y lenguaje de programaci n Python wxPython HTML Y Sistema operativo Microsoft XP y Vista Y aplicaciones externas ejecutables de Graphviz dot circo
145. cualquier secuencia de letras en su correspondiente Grafo U conectando los nodos que pertenezcan a la misma clase tienen la misma letra Y Calcular dos familias de TIs usando los Grafos U generados y Mostrar sus valores en una tabla Y Graficar y Visualizar el Grafo U de la secuencia que se seleccione Y Exportar la informaci n de la conectividad de los Grafo U en ficheros CT o NET Y Guardar los TIs calculados en ficheros TXT o CSV C mo utilizar CULSPIN CULSPIN es una aplicaci n interactiva creada con Python wxPython con formato de libreta de notas que presenta una barra de ment principal con las siguientes opciones Men File Open file permite buscar seleccionar abrir el fichero del cual se tomar n los datos de entrada secuencias de letras secuencias o series num ricas etc Una vez cargados los datos las secuencias de letras se muestran en una lista Reload sequences permite volver a trabajar con las secuencias cargadas inicialmente secuencias originales Esta opci n s lo se activa si no se le construy la espiral a todas las secuencias originales Una vez terminado el proceso de recarga todas las secuencias originales vuelven a estar disponibles en la lista Make a copy of hacer una copia en un fichero TXT de las secuencias de letras originales o las secuencias de letras estudiadas pero en el formato con el que se muestran en la lista nombre lt espacio gt secuencia Esta opci n est disponib
146. cuando la distancia es n 39 v ndice de conectividad de la distancia de Balaban J J edges edges nodes 2 Dic j Miz 4 2 di Ly dgj WF 18 nodes edges n meros de nodos aristas en la red de tipo estrella Y Indices de conectividad de Kier Hall Oy EN y wt 2y Z js i mij mikewi i degi icj k deg edegj degy 19 nw nw 3 Y mij mjyk mikm Wm Ay y mij mj EME T mo Wo i lt j lt k lt m deg deg degy degm i lt j lt k lt m lt o 4 deg deg j deg degm deg 20 mw Sy y m m m i lt j lt k lt m lt o lt q deg deg j degy degm degy degg 21 Y Indice de conectividad de Randic tf jano Tj deg deg 22 40 REGISTRO GENERAL DE LA PROPIEDAD INTELECTUAL Seg n lo dispuesto en la Ley de Propiedad Intelectual Real Decreto Legislativo 1 1996 de 12 de abril quedan inscritos en este Registro los derechos de propiedad intelectual en la forma que se determina seguidamente N MERO DE ASIENTO REGISTRAL 03 2008 1338 T tulo S2SNet Sequence to Star Network Objeto de propiedad intelectual programa de ordenador Clase de obra programa de ordenador PRIMERA INSCRIPCI N Autorles y titularles originarios de derechos Apellidos y nombre MUNTEANU Cristian Robert Nacionalidad ROM D N L N LF Pasaporte X 4541639 J Apellidos y nombre GONZ LEZ D AZ Humberto Nacionalidad CUB D N I N I F Pasaporte X 6672910 N Datos de la solicitud Num solicitud SC 309 08 Fecha de presentaci n
147. culated 2 R and 0 R values only for E and HINT potentials We have omitted the vdW term due to a simple reason the HINT potential includes a vdW compo nent The values have been used here as inputs to construct the QSAR model The detailed explanation has been published before As follows we give the formula for z R 0 R and R and some general explanations amp R V p R amp G 1 jeR OR Y p R log p R Q jeR m R Y pi R 3 i jeR It is remarkable that the spectral moments depend on the probability Ep R with which the effect of the interaction f propagates from amino acid ith to other neighbouring amino acids jth and returns to ith after k steps On the other hand both the average electrostatic potential and the entropy measures depend on the absolute probabilities Ep R with which the amino acid jth has an interaction of type f with the rest of the amino acids In any case both probabilities refer to a first k 1 direct interaction of type f between amino acids placed at a distance equal to k times the cut off distance rg kreut or The method uses a Markov Chain Model MCM to calculate these probabilities which also depend on the 3D interactions between all pairs of amino acids placed at a distance rj in r3 in the protein structure However for the sake of simplicity a truncation or cut off function aj is applied in such a way that a short term interaction takes place in a first approxim
148. cyclopedia of Complexity and Systems Science Meyers R A Ed Springer Verlag Berlin 2009 pp 2159 96 Ivanciuc O Drug Design with Artificial Neural Networks In Encyclopedia of Complexity and Systems Science Meyers R A Ed Springer Verlag Berlin 2009 pp 2139 59 Ivanciuc O Drug Design with Artificial Intelligence Methods In Encyclopedia of Complexity and Systems Science Meyers R A Ed Springer Verlag Berlin 2009 pp 2113 39 Cai Y D Chou K C Using functional domain composition to predict enzyme family classes J Proteome Res 2005 4 1 109 Ll Cai Y D Chou K C Predicting enzyme subclass by functional domain composition and pseudo amino acid composition J Proteome Res 2005 4 3 967 71 Chou K C Shen H B Euk mPLoc a fusion classifier for large scale eukaryotic protein subcellular location prediction by incor porating multiple sites J Proteome Res 2007 6 1728 1734 Rabow A A Scheraga H A Lattice neural network minimization Application of neural network optimization for locating the global minimum conformations of proteins J Mol Biol 1993 232 4 1157 68 Hill T Lewicki P STATISTICS Methods and Applications A Comprehensive Reference for Science Industry and Data Mining StatSoft Tulsa 2006 Vol 1 p 813 Fernandez M Caballero J Tundidor Camba A Linear and nonlinear QSAR study of N hydroxy 2 phenylsulfonyl amino ac etamide derivativ
149. d information about the PDB ID the values of the electrostatic potential indices the corresponding observed classification and the predicted classification for each protein is given in the ESI 2 7 To avoid homology bias and remove the redundant sequences from the benchmark dataset a cutoff threshold of 25 was recommended to exclude those proteins from the benchmark datasets that have equal to or greater than 25 sequence identity compared to any other as done in ref 94 and 106 108 However in this study we have not used such a stringent criterion because the currently available data do not allow us to do so Otherwise the number of proteins for some subsets would be too low to have statistical significance Results and discussion Alignment free LDA model for LIBPs Multiple experimental approaches have shown that individual LIBPs possess both unique and overlapping functions some of which are based on specific elements in the protein struc ture Although FA binding affinities for all LIBPs tend to correlate directly with FA hydrophobicity structure function studies indicate that subtle three dimensional 3D changes that occur upon ligand binding may promote specific protein protein or protein membrane interactions that ultimately deter mine the function of each LIBP The conformational changes are focused on the LIBP helical portal domain a region that was identified by in vitro studies to be vital for the FA transport properties
150. d takes into consideration specific 3D structural features and not a simply burden M biased predictor The value of S LBP 93 87 was the highest value predicted by LIBP Pred for a protein with unknown function selected out of the 2693 mentioned before This value corres ponds to the chain A of the protein with PDB ID 2RJB The protein deposited in PDB with unknown function is expressed by Shigella flexneri a bacterium that causes severe dysentery in human beings This result is very interesting because of the importance of the lipid i e phosphoinositide metabolic pathway in the regulation of cellular processes implicated in survival motility and trafficking which is often subverted by bacterial pathogens In fact S flexneri infection has been demonstrated recently to generate the lipid PISP to alter endocytosis and prevent termination of EGFR signaling Mol BioSyst 2012 8 851 862 857 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A This property is used by S flexneri to favour survival of host cells in the infection process In this sense if it is finally confirmed as a LIBP the present results may point out the chain A of 2RJB as a possible target for anti bacterial drugs effective against this human pathogen Mining of parasite proteins in PDB with LIBP Pred LIBPs including FABPs are being studied as important actors in hos
151. data associated with this article can be found in online version at doi 10 1016 j polymer 2009 11 029 References 1 Verra F Mangano VD Modiano D Parasite Immunol 2009 31 5 234 53 2 Mueller I Galinski MR Baird JK Carlton JM Kochar DK Alonso PL et al Lancet Infect Dis 2009 9 9 555 66 Bonilla JA Bonilla TD Yowell CA Fujioka H Dame JB Mol Microbiol 2007 65 1 64 75 4 Turschner S Efferth T Mini Rev Med Chem 2009 9 2 206 2124 5 Sanchez CP Rotmann A Stein WD Lanzer M Mol Microbiol 2008 70 4 786 98 Sanchez CP Rohrbach P McLean JE Fidock DA Stein WD Lanzer M Mol Microbiol 2007 64 2 407 20 7 Nunes MC Goldring JP Doerig C Scherf A Mol Microbiol 2007 63 2 391 403 8 Siden Kiamos I Ecker A Nyback S Louis C Sinden RE Billker O Mol Microbiol 2006 60 6 1355 63 9 Sam Yellowe TY Exp Parasitol 1993 77 2 179 94 10 Volpato JP Pelletier JN Drug Resist Updat 2009 12 1 2 28 41 11 Carucci DJ Yates 3rd JR Florens L Int J Parasitol 2002 32 13 1539 42 12 Coppel RL Black CG Int J Parasitol 2005 35 5 465 79 13 Bender A van Dooren GG Ralph SA McFadden GI Schneider G Mol Biochem Parasitol 2003 132 2 59 66 14 Carlton JM Muller R Yowell CA Fluegge MR Sturrock KA Pritt JR et al Mol Biochem Parasitol 2001 118 2 201 10 15 Coppel RL Mol Biochem Parasitol 2001 118 2 139 45 16 Cui L Fan Q Hu Y Karamycheva SA Quackenbush J Khuntirat B et al Mol Bioc
152. db org o Archivo con los resultados simples para el c lculo de las prote nas o Carpeta local con los archivos PDB para las prote nas de entrada si no existe el PDB se descargar autom ticamente desde la Web Par metros para la red de carbones alpha de los amino cidos o Los par metros Cutoff Roff Ron para definir las condiciones para considerar unidos dos tomos de carb n alpha de los amino cidos definici n de la red compleja para cada prote na o Limites para los orbitales proteicos en 96 core inner middle outer o Atributo By Chain si est activo el c lculo buscar todas las cadenas para las prote nas si est inactivo el c lculo considerar la prote na entera gt Indices promedios Averaged Indices PROT ClassAvgs txt o Utilizando los campos del header del PDBs head expression system expression system taxid name chain organism scientific molecule expression system vector type ec organism common 27 expression system plasmid engineered expression system strain cell line cellular location gen organism taxid o Utilizando las clases desde el archivo de entrada Input classes Salida con el header entero Full header information output PROT FullHeaderRes txt o obtener la informaci n completa del header de PDB y a adirla al resultado simple una columna para cada campo de la cabecera Pares de prote nas Protein PAIRS o Utilizando la similitud de las cadena
153. de las cabeceras de los PDBs para cada prote na al lado de los ndices o crear interacciones de prote na prote na utilizando para las cadenas de la misma prote na y generar pares negativos al azar o crear interacciones de prote na prote na utilizando las clases de prote nas de la entrada y generar pares negativos al azar o c lculo de ndices mixtos para pares de prote nas 29 Y para f rmacos o c lculo de ndices promedios para cada tipo de clase de f rmaco de la entrada o crear pares de f rmaco f rmaco utilizando los actividades biol gicas de la entrada o c lculo de ndices mixtos para pares de f rmacos Y para prote nas y f rmacos o crear pares de prote na f rmaco utilizando las interacciones entre ellos y generar pares negativos al azar o c lculo de ndices mixtos prote na f rmaco promedios por orbita de la prote na y por ndice k para los f rmacos Y los pares prote na prote na f rmaco f rmaco y prote na f rmaco forman redes complejas de interacciones muy tiles en el descubrimiento de nuevos f rmacos y sus dianas moleculares correspondientes 30 2 1 2 S2SNet ndices topol gicos del grafo de tipo estrella 2 1 2 1 Publicaciones con S2SNet 2 1 2 1 1 Clasificaci n tipo Random Forest basada en los ndices topol gicos del grafo tipo estrella de las prote nas antioxidantes Random Forest Classification based on Star Graph Topological Indices for An
154. dicted the protein function based on 6 R values for different types of inter actions or molecular fields The main types of molecular fields used are the following Electrostatic vdW and HINT entropies In this paper we calculated 0x R values only for Electrostatic entropies These values have been used herein to calculate PPIs invariants and next as inputs to generate the QSAR model see description of PPI invariants above However the detailed explanation for the calculation of 6 R values has been published before As follows we give the formula for 6 R values and some general explanations 41 67 70 n 6 R V Pj R log P R 4 j l It is remarkable that the average entropy measures depend on the absolute probabilities KP R according to which the amino acid jth has an electrostatic interaction with the rest of amino acids that lie within the same protein region R These probabilities refer to amino acids placed at a distance equal to k times the cut off distance rij k Tcut off The method uses a Markov Chain Model MCM to calculate these probabilities which also depend on the 3D interactions between all pairs of amino acids placed at distance rj in r3 in the protein structure However for the sake of simplicity a truncation or cut off function ojj is applied in such a way that a short term interaction takes place in a first approximation only between neighboring aa aj 1 if rij lt Tcut off Otherwise the
155. dictor y se pone disponible gratuitamente en el servidor Web denominado LIBP Pred http bio aims udc es LIBPpred php Figura 34 Los usuarios pueden realizar una recuperaci n autom tica de estructuras de prote nas desde PDB Web site o cargar sus modelos estructurales de prote nas personalizadas de su computador a trav s del servidor LOMETS Se ha demonstrado la posibilidad de efectuar un estudio predictivo de aproximadamente 2000 prote nas con funci n desconocida Se han obtenido resultados interesantes con respecto al descubrimiento de nuevos biomarcadores de c ncer en los seres humanos o las dianas de f rmacos antiparasitarios Un exjemplo de resultado para las prote nas cadenas proteicas IQGHK 114M 2QZTB 1B0U se presenta en la Figura 35 67 BIC Network Sa 2 Faculty LIBPpred Modei O Bio AIMS gt lt Modelling the reality Home Links About Process ID 17562517431e642d57 please wait PDB Update Verification 1QGHK 114M 2QZTB 1BOU Done Calculating Result file Results 17562517431e642d57 LIBPpred Mode1 txt LIBPpred Bio AIMS Mode 1 Standard PDB input Lipid Binding Proteins Prediction by using MARCH INSIDE and LDA based on electrostatic spectral moments Accuracy of 89 11 Results http bio aims Results 17562517431e642d57 2013 04 21 20 37 27 PDBChain LIBP Prediction 1QGHK 0 00 114M 29 66 2QZTB 100 00 1BOU 45 34 the input contains n
156. dies The 0x R parameters used represent the average electrostatic entropy 0 due to the interactions between all pairs of amino acids allocated inside a specific protein region R and placed at a distance k from each other In this work we want to use R values of two proteins 0j TR for protein 1 and 6 R for protein 2 in order to generate structural parameters describing PPI between these proteins To this end we introduced herein for the first time a new type of PPI invariants in the sense that they do not depend on the interchange of proteins so that we do not need to label and distinguish them for calculation We introduce with this aim three types of invariants ti R PPI Average Entropy Invariant ti a PPI Entropy Difference Invariant ti d and PPI Entropy Product Invariant ti p OCR OL IRER HOR RI 1 49 R 40 1Ry 2R1 04 T R1 0R1 I 2 POR POL Ry R4 0 Ri OR 3 Notably in order to guarantee that these parameters are invariant to protein labeling as 1 or 2 we have to always use the same R R R and k kz k values In order to calculate the OR values for each protein the method uses as a source of protein macromolecular descriptors the stochastic matrices TI built up as squared matrices n x n where n is the number of amino acids aa in the protein The subscript e points to the electrostatic type of molecular force field In previous works we have pre
157. dium sp parasites pPPIs and not present in humans or other hosts may be promising targets for the development of safe drugs with low toxicity On the contrary the prediction of non pPPC non unique Plasmodium sp parasites but also present in humans may become a source for the discovery of targets related to drug resistance not only for the treatment of malaria but also of human cancer For instance Human Dihydrofolate Reductase DHFR constitutes a primary target for antifolate drugs in cancer treatment whereas DHFRs from P falciparum and P vivax are primary targets in the treatment of malaria A recent review 10 has discussed the structural and functional impact of active site mutations with respect to enzyme activity and antifolate resis tance of DHFRs from mammals protozoa and bacteria DHFR is a monomeric protein with only one chain in structures deposited in PDB However DHFR synthase is a non pPPC polymeric protein which is directly involved in DHFR synthesis and consequently in drug resistance For instance the structure of DHFR synthase reported in the file with PDB ID 3HBB is a PPC with four different protein chains In this regard a computational model able to predict non pPPC such as DHFRs may be interesting for the prediction of protein targets involved in drug resistance in both parasite and mammalian which may be useful in the design of chemo protective agents In any case the high number of possible genes proteins d
158. drogenases function targeted cleavage of HIV Rev response element RNA calm odulin peptide complexes In addition these motifs are important for the new chemical nuclease design in biotech nology and also as therapeutic agents The N terminus region of ATCUN containing proteins is highly disordered and the geometrical features cannot be easily extracted from the protein structures The motif participates in the metal interac tion with the free N terminal NH group from residue aal the next two peptide nitrogen atoms from residues aa2 and His3 and a nitrogen from the imidazole group of His3 In the case of the simulated copper binding peptide Gly Gly His N methyl amide the four nitrogen atoms form a distorted square planar arrangement Sankararamakrishnan Verma and Kumar reported a list of ATCUN like motifs from 1949 polypeptide chains and found that only 1 9 and 0 3 of histidines are associated with partial and full ATCUN like geometric features Journal of Proteome Research 2009 8 5219 5228 5219 Published on Web 09 18 2009 research articles respectively They observed that the ATCUN like motifs are not presented in the middle of the a helix or fj strand The present work uses the protein Quantitative Structure Activity Relationship QSAR method for predicting the antitu mor activity of ATCUN proteins We can use many physicochem ical parameters such as charges or hydrophilicity parameters 9 to characterize pro
159. ds such as protein modeling and molecular dynamics may predict the protein geometry and therefore some geometry criteria may predict the ATCUN activity of the proteins The limitations of these methods for the present problem are the following they are time consuming and incomplete Thus this work presents a better alternative such as a general fast and accurate method for the evaluation of the ATCUN activity of new proteins by using only the PDB geometry Databases We used a total of 415 proteins to develop the model The nonactive proteins were randomly selected from the PDB server and the list of the potential ATCUN feature antitu mor proteins were obtained from the literature The PDB database was also used to select 721 proteins from 47 parasite species only predicted not used to train or validate the model The correspondent 1751 protein chains were tested for the DNA cleavage anticancer property by using the best QSAR model resulted Results and Discussion Model for ATCUN Activity The protein biological activity in organic and inorganic biochemistry can be predicted by using the protein QSAR models combined with simplified truncated electrostatics The present work is based on the electrostatic spectral moments z O for a protein QSAR study of interest in bioinorganic chemistry LDA was used to find the best QSAR model that can classify new proteins into two groups in the absence of prior information nonactive or po
160. e QSAR e electrostatic potential e Plasmodium e Fasciola e Leishmania Introduction An important goal in bioinorganic chemistry is to find the function of a protein from the experimentally determined struc ture with minimum costs Thus the chemical databases contain numerous 3D metal binding protein structures without any information about their biological function that depends on the metal ion type Inside these proteins there can be found specific amino acid sequences with high affinity for different metals The amino terminal Cu ID and Ni ID binding ATCUN motif is a small metal binding site and was discovered for the first time in serum albumin It was proven to have antitumor activity by participating to the DNA cleavage with the NH3 To whom correspondence should be addressed Phone 34 981 167 000 Ext 1302 fax 34 981 167 160 e mail muntisa gmail com e mail H G D humberto gonzalez usc es University of A Coru a Department of Inorganic Chemistry Faculty of Pharmacy University of Santiago de Compostela S Department of Microbiology amp Parasitology Faculty of Pharmacy University of Santiago de Compostela 10 1021 pr900556g CCC 40 75 2009 American Chemical Society aal aa2 His3 sequence and to be involved in the central nervous system function and cancer growth Alzheimer s disease cation z electron interactions in proteins e g Cu against tryptophan indole ring E coli hy
161. e Since fatty acids are essential components of all bio membranes molecular and functional studies on LIBPs point new directions for the drug target discovery vaccine design or biomarker prediction for many human metabolic and other diseases as well as against parasitic diseases In any case the number of proteins of different organisms to be experimentally assayed is so vast that the use of computational techniques may be of help to speed up the process For instance very recently Kuang and Colgrave et al have revealed the complexity of the secreted NPA and FAR FABPs families of Haemonchus contortus by an iterative proteomics bioinformatics approach The parasite H contortus also known as red stomach worm wire worm or Barber s pole worm is a very common parasite and one of the most pathogenic nematodes of ruminants Using the human genome database the recently developed G protein coupled receptor GPCR deorphanization strategy has successfully identified multiple LIBPs receptors for fatty acids On the other hand we can use in principle structure dependent physicochemical parameters such as charges or hydrophilicity parameters to characterize proteins in quantitative structure function relationship studies also known as Quantitative Structure Activity Relationships QSAR However many of these QSAR models are based on more simple numerical parameters called Topological Indices TIs derived from a graph or networ
162. e nas El mejor modelo cuantitativo de la relaci n estructura enfermedad se basa en once ndices de entrop a de Shannon Se obtiene con el m todo del clasificador bayesiano ingenuo Naive Bayes y muestra una excelente capacidad predictiva 90 9296 para nuevas prote nas vinculadas con este tipo de c ncer El an lisis estad stico confirma que este modelo permite el diagn stico del c ncer de colon humano con AUROC de 0 91 La metodolog a que se presenta puede ser utilizada para cualquier tipo de informaci n secuencial como cualquier prote na o secuencias de cidos nucleicos 42 2 1 3 2 Manual del programa CULSPIN Qu es CULSPIN CULSPIN Compute ULam SPiral INdices transforma cualquier secuencia de letras en una representaci n gr fica que usa como plantilla la espiral de Ulam disposici n de los n meros naturales en forma de espiral y en la que se conectan aquellos nodos que pertenecen a la misma clase tienen la misma letra La interfaz se presenta en la Figura 18 Welcome to CULSPIN V 1 0 Figura 18 Interfaz del programa CULSPIN Un ejemplo es el grafo tipo espiral en la Figura 19 para la siguiente secuencia Cha 01 GDDGGDGGGGGGGGDGGGDGDDGGGDGGGDGDGGDGDDDDGGGGGDGGDDGGGGGG GGGGGGGGGGKKKKKAAAKKAKKKKKKAAA KKKKAKKKKKAAKKKKKKKKKAAKKAAAAAK 43 Figure 19 Grafo espiral para la secuencia Cha 01 Adem s bas ndose en este grafo CULSPIN calcula dos familias de ndices Topol gicos TIs Estos ndices p
163. e correlated to the bio logical property 104 The automatic selection of variables features was activated for all models In particular the Linear Neural Network LNN algorithm and other types of Artificial Neural Network ANN were used herein to train different linear and non linear models based on different combinations of parameters Table 1 also depicts the results for the best models found The profile of the ANN model was specified with a simple notation as follows ANN type Niv Nin Nyi Nyo Non Noy The ANN types presented besides LNN are Multi Layer Perceptron MLP Probabilistic Neural Network PNN and Radial Basis Function RBF 105 The parameter Niy is the number of input variables Nin is the number of input neurons one per input variable Nu4 is the number of neurons in the first Hidden layer H1 Ny2 is the number of neurons in the second Hidden layer H1 Non is the number of output neurons and No is the number of output variables LNN 4 4 1 1 pc MLP 4 4 6 6 1 1 Fig 4 Illustrative example of the topology used for different ANNs trained in this work 270 Y Rodriguez Soca et al Polymer 51 2010 264 273 1 1 1 0 0 9 0 8 0 7 0 6 05 0 4 1 Specificity 0 3 0 2 0 1 Training set ROC area 0 96 00 Validation set ROC area 0 94 Random classifier ROC area 0 50 0 1 0 2 0 2 01 00 01 02 03 04 05 06 07 08 09 10 11
164. e en Am rica del Sur por la enfermedad de Chagas que puede causar una enfermedad grave y muerte especialmente en ni os peque os En este contexto el descubrimiento de dianas terap uticas nuevas en Tripanosoma proteoma es muy importante para la comunidad cient fica Recientemente muchos investigadores han dedicado importantes esfuerzos en el estudio de las interacciones prote na prote na PPIs Protein Protein Interactions en las especies pat genas de Tripanosoma y concluyeron que la identidad baja entre algunas prote nas de par sitos y su hu sped humano convierten a estas PPIs en dianas farmacol gicas muy prometedoras No hay modelos generales conocidos para predecir PPIs nicas en Tripanosoma TPPIs Por otro lado la estructura 3D de un numero creciente de prote nas de Tripanosoma se encuentra en las bases de datos En este sentido es muy importante la introducci n de un nuevo modelo para predecir el TPPI de la estructura 3D de prote nas implicadas en las PPI Por eso hemos introducido nuevos invariantes de los complejos prote na prote na basados en el potencial electrost tico Markov promedio Ri para de los amino cidos ubicados en diferentes regiones Ri de la prote na i sima y colocada a una distancia k una de la otra Se calcularon m s de 30 tipos diferentes de par metros para 7866 pares de prote nas 1023 TPPIs y 6823 no TPPI de m s de 20 organismos incluyendo par sitos y hu spedes humanos o bovinos Hemos e
165. e parasite This pair of protein chains is evaluated to make up the unique complex in Plasmodium that can be a target for new anti parasite drugs The second pair example is formed by the chain E of the 2F6 hydrolase 114 a ATP dependent CLP protease serine type endopeptidase from Plasmodium falciparum expressed in E coli and the chain A ofthe 2GHU hydrolase Falcipain 2 FP 2 of P falciparum 115 FP 2 is a papain family C1A cysteine protease that plays an important role in the parasite life cycle by degrading erythrocyte proteins most notably hemoglobin Inhibition of FP 2 and its paralogues prevents parasite maturation These two chains of hydrolases are not evaluated by our tool to form a unique complex This can be explained by the different targets of these hydrolases and different cellular localizations 2F6I in cytoplasma and 2GHU in food vacuole for hemoglobin degradation and cleavage of cytoskeletal elements The last example is formed by the chains C and F of the 1SYR protein a Plasmodium falciparum thioredoxin in the genetic structure with an unknown function 116 These chains are eval uated to form a unique complex according to the localization of both chains in the same protein PlasmodPPI tool can become important for the discovery of new anti plasmodium drug targets and can be useful as model for building similar models for other types of parasites or other organisms 4 Conclusions The overall findings suggest th
166. e reduced due to the low number of samples used for training The simplest method to determine the DA of our QSAR model is the visual inspection of the leverage plot residuals vs leverages of the training instances 9997 The leverage A of a sample in the original variable space measures its influence on the model and it is defined as follows hi y XRD y 1 n 7 y are the indices or descriptor vectors of the considered instance zy in this work and X is the model matrix derived Parasite Protein ATCUN DNA Cleavage Model Table 2 DNA Cleavage Evaluation for Parasite Protein Chains research articles PDB chain function prob 96 PDB chain function prob 96 Ascaris Suum Plasmodium falciparum 1008 A Oxidoreductase 100 000 3EBG A Hydrolase 100 000 1LLQ A Oxidoreductase 100 000 3EBH A Hydrolase 100 000 1LLQ B Oxidoreductase 100 000 1ZRO A Cell invasion 100 000 1008 B Oxidoreductase 100 000 1ZRO B Cell invasion 100 000 1F34 A Hydrolase 99 172 3EBI A Hydrolase 100 000 2BJR A Motility 98 982 1ZRL A Cell invasion 100 000 2BJR B Motility 98 931 2EPH G Lyase 100 000 2BJQ A Motility 98 640 2PC4 C Lyase 100 000 1EAI A Serine proteinase 95 212 2PC4 A Lyase 100 000 1EAI B Serine proteinase 95 083 2W40 A Transferase 100 000 Entamoeba Histolytica Toxoplasma gondii 20UI B Oxidoreductase 99 9998 3GG8 C Transferase 100 0000 20UI C Oxidoreductase 99 9997 3GG8 A Transferase 100 0000 20UI D Oxidoreductase 99 9990 2ABS A Signaling 100 0
167. eat amount of variables is available Among the objectives of FS we can consider the following as some of the most important ones to avoid overfitting and improve model performance to provide faster and more cost effective models and to gain a deeper insight into the underlying processes that generated the data In the context of classification feature selection techniques can be organized into three categories depending on how they combine the feature selection search with the construction of the classi fication model filter methods wrapper methods and embedded methods In this paper several FS techniques were applied but the best results were obtained by combining Correlation based Feature Subset Selection CfsSubsetEval which is correlation based and thus a filter method with Best First which uses hill climbing augmented with a backtracking facility or by combining Consistency based Feature Subset Selection ConsistencySubsetEval which is also a filter method with Linear Forward Selection LinearForwardSelection which is an extension of Best First Filter methods assess the relevance of features by looking only at the intrinsic properties of the data Feature selection has been widely used in bioinformatics Artificial Neural Networks ANNs have been extensively used for classification problems In this paper the Multilayer Perceptron MLP has been utilized An MLP is a feedforward artificial neural network model th
168. ectral moments were selected for this work by considering the high efficiency shown for protein QSAR models in biochemistry 9 We propose the simplest up to date reported QSAR equation for the ATCUN antitumor proteins The average 3D electrostatic spectral moments zi were calculated for 415 proteins including 133 potential ATCUN antitumor proteins The Linear Discriminant Analysis model used these TIs to assign proteins into two groups the ATCUN DNA cleavage proteins metal bound active proteins and the nonactive proteins metal nonbound inactive proteins The desirability analysis was used to predict the combined values for the electrostatic spectral moments in the inner region with respect to the total structure that ensures ATCUN mediated anticancer action In addition we developed a Receiver Operating Characteristic ROC curve analysis to demonstrate that the present model shows significant differences with respect to a random classifier We demonstrated the robustness of the model by plotting the residuals and explained the model domain applicability by using the model leverage The results were compared with a similar QSAR model based on the average 3D electrostatic potentials amp for ATCUN proteins The ATCUN motifs have been reported to be important for humans 995 fish 8 or viruses but there is no link to parasites Thus the Protein Data Bank PDB proteins from different parasites were predicted for the DNA cleavage anticance
169. ed QSAR models for the prediction of enzyme function Two additional servers based on MARCH INSIDE are Trypano PPI and Plasmod PPI 56 These are the first servers that predict self protein protein This journal is The Royal Society of Chemistry 2012 View Online LIBPs nLIBPs MARCH INSIDE l gt LIBPpred XN ra gt or L lt LIBPpred TENES Parasite Cancer Targets Biomarkers Fig 1 Flowchart for all the steps necessary to construct use the classifiers and server complexes in Trypanosome sp or Plasmodium sp proteomes opening new opportunities for anti trypanosome or anti malarial drug target discovery For all these reasons we use the MARCH INSIDE approach in this work to solve the problem of predicting LIBPs from the 3D structure of proteins In the present work we have developed the first 3D QSAR method useful to discriminate between LIBP and non LIBPs nLIBPs Using MARCH INSIDE 2 0 we have calculated different local and global parameters to a large series of LIBPs and nLIBPs see Fig 1 The parameters calculated are of three different classes average electrostatic potentials R together with spectral moments of 2 R and entropy measures 0 R of the electrostatic field of amino acids placed at distance k from each other within different regions R of the protein 3D structure Next we have carried out a statistical analysis in order to seek a linear equation 3D QSAR model that links t
170. ella men Calculations Sequence to Star Network o el bot n S2SNet desde el panel principal del programa las entropias de Shannon de los n matrices Markov Sh traces de las mismas matrices Tr el n mero de Harary H el ndice de Wiener W los ndices topol gicos de Gutman S6 de Schultz non trivial part S de 33 Moreau Broto 475 el indice de conectividad de distancia Balaban J los ndices de conectividad Kier Hall y Randic Y Transformar los datos de tipo num rico en secuencias de caracteres ment Calculations Numbers to Sequence Y Transformar las secuencias de grupos de n caracteres en secuencias simples como un cambio de codificaci n men Calculations N to 1 Character Sequence Y Editar Visualizar los archivos de entrada y de salida de tipo texto Y Crear archivos que describen grafos en el lenguaje DOT estos archivos se utilizan como entrada de los ejecutables de Graphviz para visualizar los grafos Y Crear im genes PNG con los grafos y visualizarlas Descripci n de la S2SNet La S2SNet es un programa interactivo que tiene dos paneles el panel principal y la consola de DOS Figura 14 S2SNet Sequence to Star Network ver 1 0 Sh File Calculations Help sequent CAPython24python exe Lole jam groups txt AX Sequence 2 Star Network ver 1 0 2008 weights txt by Cristian R Munteanu and Humberto Gonzalez Diaz E mail muntisa gmail com IAI characters in Sequences
171. ented by the amino acid sequences primary structure of the protein related or not with HCC By using new software programmed by our group CULSPIN the sequences of amino acids are transformed into spiral graphs and the corresponding topological indices The resulting numbers that characterized each graph that is a protein graphical representation are then used in Weka to find the best QSDR classification model The final model is used to predict if a new protein is linked with HCC using only its amino acid sequence Protein set This work is based on the same datasets used in the previous studies with lattice and star type graphs for protein linked with HCC The sets of protein primary sequences are repre sented by a set of 69 HCC cancer proteins and 276 non cancer proteins To avoid homology bias and remove the redundant sequences from the benchmark dataset a cut off threshold of 25 was imposed to exclude those proteins from the benchmark datasets that have equal to or greater than 25 sequence identity to any other one in a same subset However in this study we did not use such a stringent criterion This journal is The Royal Society of Chemistry 2012 Protein HIST1HIB related with HCC Protein amino acid sequences Seguence A d MSETAPAETATPAPVEKSPAKKKATKKAAGAGAAKRKATGPPVSELITKAVAASKERN primary structure related or GLSLAALKKALAAGGYDVEKNINSRIKLGLKSLVSKGTLVQTKGTGASGSFKLNKKAASG not with HCC EAKPKAKKAGAAKAKKPAGAT
172. era P rez MA Bermejo Sanz M Ramos Torres L Grau valos R P rez Gonz lez M Gonz lez D az H A topological sub structural approach for predicting human intestinal absorption of drugs Eur J Med Chem 2004 39 905 16 Molina Ruiz R Saiz Urra L Rodriguez Borges JE Perez Castillo Y Gonzalez MP Garcia Mera X et al A TOPological Sub structural Molecular Design TOPS MODE QSAR approach for modeling the antiproliferative activity against murine leukemia tumor cell line L1210 Bioorg Med Chem 2009 17 2 537 47 Casanola Martin GM Marrero Ponce Y Tareq Hassan Khan M Torrens F Perez Gimenez F Rescigno A Atom and bond based 2D TOMOCOMD CARDD approach and ligand based virtual screening for the drug discovery of new tyrosinase inhibitors J Biomol Screen 2008 13 10 1014 24 Gonzalez Diaz H Duardo Sanchez A Ubeira FM Prado Prado F Perez Montoto LG Concu R et al Review of MARCH INSIDE amp Complex Networks Prediction of Drugs ADMET Anti parasite Activity Metabolizing Enzymes and Cardiotoxicity Proteome Biomarkers Curr Drug Metab 2010 11 379 406 Kier LB Hall LH Molecular Structure Description The Electrotopological State Academic Press 1999 Katritzky AR Oliferenko A Lomaka A Karelson M Six membered cyclic ureas as HIV 1 protease inhibitors a QSAR study based on CODESSA PRO approach Quantitative structure activity relationships Bioorg Med Chem Lett 2002 12 23 3453 7 Katritzky AR Kulshyn OV Stoyanova
173. es as matrix metalloproteinase inhibitors Bioorg Med Chem 2006 14 12 4137 50 Schlessinger A Yachdav G Rost B PROFbval predict flexible and rigid residues in proteins Bioinformatics 2006 22 7 891 3 Mewes H W Frishman D Mayer K F Munsterkotter M Noubibou O Pagel P Rattei T Oesterheld M Ruepp A Stumpflen V MIPS analysis and annotation of proteins from whole genomes in 2005 Nucleic Acids Res 2006 34 Database issue D169 72 Xie D Li A Wang M Fan Z Feng H LOCSVMPSI a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI BLAST Nucleic Acids Res 2005 33 Web Server issue W105 10 McDermott J Guerquin M Frazier Z Chang A N Samudrala R BIOVERSE enhancements to the framework for structural functional and contextual modeling of proteins and proteomes Nucleic Acids Res 2005 33 Web Server issue W324 5 PR900827B Polymer 51 2010 264 273 ELSEVIER journal homepage www elsevier com locate polymer Contents lists available at ScienceDirect polymer Polymer Plasmod PPI A web server predicting complex biopolymer targets in plasmodium with entropy measures of protein protein interactions Yamilet Rodriguez Soca Cristian R Munteanu e Julian Dorado E Juan Rabu al b Alejandro Pazos Humberto Gonz lez D az Department of Microbiology Er Parasitology Faculty of Pharmacy USC 15782
174. esde cero es decir se perder n los grafos y los TIs calculados si no se han guardado en ficheros IV Indices levels esta caja de controles s lo se encuentra activa si se ha construido al menos una espiral y permite seleccionar a qu nivel queremos calcular las dos familias de TIs implementadas en esta versi n de CULSPIN gt by classes in gnomons si se selecciona esta opci n las dos familias de TIs se calculan para cada una de las clases en cada uno de los gn mones En el caso en que una clase no se encuentre en un determinado gnomon su Frecuencia y su Entropia de Shannon en ese gnomon son cero Esta opci n es m s util cuando las secuencias no tienen muchas clases y no son muy grandes en caso contrario se obtendr a un numero demasiado elevado de ndices y por tanto su procesamiento estad stico posterior muy engorroso by classes in global graph en esta opci n los TIs se calculan para cada una de las clases pero en todo el grafo En otras palabras los TIs de una clase dada en todo el grafo son el resultado de la sumatoria de sus valores en todos los gn mones Esta opci n reduce el n mero de T7s en el caso de secuencias muy grandes por lo que resulta una buena opci n en tales casos gt by gnomons si se selecciona esta opci n los TIs se calculan a nivel de gn mones independientemente de las clases En otras palabras los ndices para un gnomon determinado son el resultado de la sumatoria de los TIs de todas las clases
175. est and the subsampling test can be avoided because the outcome obtained by the jackknife cross validation is always unique for a given benchmark dataset Accordingly the jackknife test has been increasingly and widely used by those investigators with strong math back ground to examine the quality of various predictors see e g ref 94 103 However to reduce the computational time in this study we have adopted the independent testing dataset cross validation as many investigators had done with SVM as the prediction engine Dataset The protein structures were downloaded from PDB using the following schemes for PDB database search i introducing as input parameter the text fatty acid binding in the search item called function for positive cases Scheme ii This journal is The Royal Society of Chemistry 2012 View Online was used to get negative cases introducing the PDB IDs for all the proteins contained in the list reported in the article of Dobson and Doig The positive cases are those proteins with function annotation as LIBPs in the PDB The list of negative cases of nLIBPs from the search scheme ii contains enzymes and other proteins present in humans and many other organisms including other parasites see ESI 11 The nLIBPs have known functions different from LIBPs The dataset was made up of 1801 proteins 801 LIBPs and 1000 nLIBPs from more than 20 organisms including parasites and human or cattle hosts Detaile
176. etely different con clusions For instance a predictor achieving a higher success rate than the other predictor for a given independent testing dataset might fail to keep so when tested by another indepen dent testing dataset ii For the subsampling test the con crete procedure usually used in the literature is the 5 fold 7 fold or 10 fold cross validation The problem with this kind of subsampling test is that the number of possible selections in dividing a benchmark dataset is an astronomical figure even for a very simple dataset as demonstrated by eqn 28 30 in ref 49 Therefore in any actual subsampling cross validation tests only an extremely small fraction of the possible selec tions are taken into account Since different selections will always lead to different results even for the same benchmark dataset and the same predictor the subsampling test cannot avoid the arbitrariness either A test method unable to yield a unique outcome cannot be deemed as a good one iii In the jackknife test all the proteins in the benchmark dataset will be singled out one by one and tested by the predictor trained by the remaining protein samples During the process of jackknifing both the training dataset and testing dataset are actually open and each protein sample will be in turn moved between the two The jackknife test can exclude the memory effect Also the arbitrariness problem as mentioned above for the independent dataset t
177. ets in parasites and bacteria Molecular BioSystems 8 851 862 2012 http goo gl C T NcP Herramienta http bio aims udc es LIBPpred php 1 INTRODUCCI N Desde cuando se ha manifestado el inter s m dico en los microbios y par sitos los cient ficos intentaron encontrar los m todos m s eficaces para combatir los efectos negativos en la salud de las personas En esta lucha los organismos dianas est n aprendiendo continuamente a desarrollar resistencia contra los f rmacos actuales y a adaptarse a nuevos condiciones del entorno Por ello se necesitan m todos r pidos accesibles y baratos para descubrir nuevos f rmacos y dianas moleculares contra los microbios y par sitos Los m todos te ricos son una opci n excelente para encontrar m s r pido y con menos recursos materiales y humanos nuevos tratamientos para mejorar la calidad de vida de las personas La tesis actual propone el desarrollo de nuevos aplicaciones y programas inform ticos para el descubrimiento de f rmacos y dianas moleculares utilizando t cnicas de ingenier a inform tica e inteligencia artificial para clasificaci n Se desarollan modelos te ricos basados en la teor a de las redes complejas o del grafo y en las t cnicas de las relaciones cuantitativas estructura actividad o propiedad OSAR QSPR y la implementaci n de los mejores modelos en herramientas gratis online accesible desde cualquier parte del mundo Para poder desarrollar este tipo de soluci n
178. ez Diaz E Quezada E Uriarte M Yanez D Vina and F Orallo J Med Chem 2008 51 6740 6751 D Vina E Uriarte F Orallo and H Gonzalez Diaz Mol Pharmacol 2009 6 825 835 H Gonzalez Diaz F Prado Prado and F M Ubeira Curr Top Med Chem 2008 8 1676 1690 H Gonz lez D az Y Gonz lez D az L Santana F M Ubeira and E Uriarte Proteomics 2008 8 750 778 H Gonz lez D az S Vilar L Santana and E Uriarte Curr Top Med Chem 2007 7 1025 1039 R Concu M A Dea Ayuela L G Perez Montoto F J Prado Prado E Uriarte F Bolas Fernandez G Podda A Pazos C R Munteanu F M Ubeira and H Gonzalez Diaz Biochim Biophys Acta 2009 1794 1784 1794 S Vilar H Gonzalez Diaz L Santana and E Uriarte J Theor Biol 2009 261 449 458 C R Munteanu J M Vazquez J Dorado A P Sierra A Sanchez Gonzalez F J Prado Prado and H Gonzalez Diaz J Proteome Res 2009 8 5219 5228 R Concu M A Dea Ayuela L G Perez Montoto F J Prado Prado E Uriarte F Bolas Fernandez G Podda A Pazos C R Munteanu F M Ubeira and H Gonzalez Diaz Biochim Biophys Acta 2009 1794 1784 1794 R Concu M A Dea Ayuela L G Perez Montoto F Bolas Fernandez F J Prado Prado G Podda E Uriarte F M Ubeira and H Gonzalez Diaz J Proteome Res 2009 8 4372 4382 Y Rodriguez Soca C R Munteanu J Dorado A Pazos F J Prado Prado and H Gonzalez Diaz J
179. fernandez udc es E Fern ndez Blanco vaguiar udc es V Aguiar Pulido muntisa gmail com C R Munteanu julian udc es J Dorado 0022 5193 see front matter 2012 Elsevier Ltd All rights reserved http dx doi org 10 1016 j jtbi 2012 10 006 de Magalhaes 2010 2011 2012 Freitas and de Magalhaes 2012 Harman 1981 Hayflick 2000 is necessary Several important works have proposed specific relationships between genes or proteins and aging Aledo et al 2011 2012 de Magalh es et al 2009 Freitas et al 2011 Gomes et al 2011 Li et al 2010 More research focused on antioxidant molecules may be useful for this purpose since for example oxidative stress is one of the risk factors of colorectal carcinogenesis In inflammatory reac tions the activated leucocytes produce mutagenic and mitogenic free radicals hereby promoting tumour formation In addition obesity hyperlipidemia and hyperinsulinemia increase the energy supply of epithelial cells thus leading to deregulation of the mitochondrial electron transport chain Finally the latter 332 E Fern ndez Blanco et al Journal of Theoretical Biology 317 2013 331 337 leads to increased free radical production causing troubles in cell cycle regulation mutations and unrestricted proliferation of damaged cells Reg ly M rei et al 2007 Unfortunately the number of molecules that have antioxidant properties in nature is quite low Therefore developing m
180. g HIV C fi www csbio sjtu educn HiVcleave Predicting HIV protease cleavage sites in proteins Read Me Data Citation HIV protease type 2 HIV 1 HIV 2 Cutoff threshold R 7 0 0 Input protein sequence example Clear All Reference Hong Bin Shen Kuo Chen Chou Hl Vcleave a web server for predicting HIV protease cleavage sites in proteins Analytical Biochemistry 2008 375 388 390 Kuo Chen Chou Prediction of HIV protease cleavage sites in proteins Analytical Biochemistry 1996 233 1 14 Contact Hong Bin Figura 11 El servidor H Vcleave para predecir los sitios de cleavage de las proteasas del HIV en prote nas Una colecci n de modelos QSAR para diversos organismos como dianas est presentada en la Web del Open QSAR http www opengsar org Aqui se pueden encontrar ejemplos de modelos validados y estables con t cnicas lineales redes neurales artificiales ANN y de regresi n mediante m nimos cuadrados parciales PLS para organismos como los virus Human herpesvirus Hepatitis C virus HIV 1 Entamoeba histolytica Leishmania donovani Plasmodium falciparum y Toxoplasma gondii Las desventajas de estos modelos son el n mero muy reducido de casos usados para entrenar y para validar el modelo El numero reducido de herramientas online con modelos QSAR para el descubrimiento de f rmacos y sus dianas proteicas correspondientes ha creado la necesidad de nuevos servidores p blicos En est
181. grafos espec ficos en el caso de las prote nas los nodos son los carbonos alpha de los amino cidos desde la estructura 3D y en el caso de los f rmacos los nodos son todos los tomos de la formula qu mica c digos SMILES para eso se desarrollaron tres programas inform ticos que pueden calcular descriptores moleculares utilizando diferentes tipos de grafos MInD Prot S2SNet y CULSPIN estos grafos se caracterizan por unos ndices topol gicos descriptores moleculares que se basan en matrices de conectividad distancias entre nodos grados de enlace de los nodos y probabilidades de transici n estos n meros espec ficos para cada mol cula con una actividad biol gica espec fica se pueden utilizar para crear modelos de clasificaci n QSAR mediante an lisis discriminante general redes neuronales artificiales aprendizaje autom tico computaci n evolutiva etc con estos modelos se pueden evaluar nuevos f rmacos y dianas proteicas para una funci n biol gica espec fica los mejores modelos se implementan en una colecci n de cuatro herramientas online en el servidor Bio AIMS http bio aims udc es Trypano PPI para el estudio de las interacciones prote na prote na en Tripanosoma Plasmod PPI para las interacciones prote na prote na en Plasmodium ATCUNpred para la actividad ATCUN de las prote nas con aplicaci n en par sitos como Trypanosoma Plasmodium Leishmania 0 Toxoplasma y LIBPpred para la predicci n de pr
182. h model the classification scores obtained for the different classes as well as the global classification percentages the precision values for the target class antioxidant proteins the ROC values and the number of attributes that were considered The Random Forest technique seems to be the best option because it achieves a percentage of 94 6 correctly classified instances In addition it is interesting to note that for the antioxidant class it achieves a percentage of 84 correctly classified instances This model achieves a precision of 82 9 which is the highest among the tested machine learning methods Table 2 Attributes subsets for the tests Subset Name Attributes Non embedded graph Embedded graph Sh Sh0 Sh1 Sh2 Sh3 Sh4 Sh5 eSh0 eSh1 eSh2 eSh3 eSh4 eSh5 Tr TrO Tr2 Tr4 eTrO eTr2 eTr3 eTr4 eTr5 X XO X1R X2 X3 X4 X5 eXO0 eX1R eX2 eX3 eX4 eX5 Remaining H W S6 S J eH eW eS6 eS eJ Table 3 Results obtained using the different subsets as input considering 12 attributes Global ROC precision Technique non Precision antiox antiox global antiOx Naive 95 7 563 62 7 298 87 4 0 79 bayes MLP 38 6 95 5 862 622 84 6 0 851 K star 515 952 8831 67 3 87 2 0 926 JRip 47 2 98 6 90 0 86 4 89 9 0 726 Random 809 942 92 0 73 0 92 5 0 875 tree Random 793 944 91 9 73 2 92 3 0 913 Forest Naive 740 57 3 6031 74 7 60 1 0 797 bayes MLP 0 100 838 0 83 8 0 644 K star 8
183. h uses a Markov chain model MCM of the in tramolecular movement of electrons to calculate structural parameters of drugs In subsequent studies we have ex tended this method to perform a fast calculation of 2D and 3D alignment free structural parameters based on molecular vibrations in RNA secondary structures or electrostatic potential and van der Waals interactions in proteins Cur rently the method was renamed as Markov chains invariants for networks simulation and design MARCH INSIDE 2 0 This describe more adequately the broad uses of the method that describes the structure of drugs RNA and pro teins 5 as well as drug drug networks and drug protein interactions The MARCH INSIDE may be used also to study PPIs bacteria bacteria coaggregation parasite host interactions and other systems with a MCM associated to a network In very recent reviews we have discussed the last applications of this method For all these reasons in Journal of Proteome Research e Vol 9 No 2 2010 1183 technical notes this work we use MARCH INSIDE approach to solve the problem of predicting specific TPPIs from the 3D structure of the two proteins involved Last we implement the first public server for prediction of TPPIs Methods Electrostatic Parameters of Protein Protein Interaction In previous works we used 3D electrostatic potential invariants derived with an MCM to describe the 3D structure of one protein bac
184. he 3D electrostatic parameters of the protein structural network with S LIBP values The S LIBP output is a real valued variable that scores the propensity of a protein to act as a LIBP In addition we have implemented the model in a public web server for the prediction of these proteins called LIBP Pred Last we have illustrated the use of LIBP Pred to carry out online data mining of the PDB We have predicted S LIBP values for 2000 proteins in humans and parasites with known structure but unknown function This type of study may help us to discover new LIBPs useful as human cancer biomarkers of drug targets in parasites Materials and methods Computational methods MARCH INSIDE method In this work the information about the molecular structure of the proteins is codified by Mol BioSyst 2012 8 851 862 853 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A using the MM method with the II matrix the short term electrostatic interaction matrix The matrix II is constructed as a squared matrix n x n where n is the number of amino acids aa in the protein In previous works we have predicted protein function based on ju R and 0 R values of 3D potentials for different types of interactions or mole cular fields derived from II The main types of the used molecular fields are E vdW and HINT potentials 5565 70 In this paper we have cal
185. he degrees of the nodes present in this gnomon Keeping in mind all the above mentioned the indices calculated by CULSPIN by classes in the global graph are defined in the following formulas Frequencies Fr c gt n deg n c gt ideg i 1 where c class nc node with class c in spiral graph GU Shannon entropies Sh c Fr c log Fr c 2 The data for the current work were formatted as text file by rows and the topological indices have been calculated by classes in the global graph Data analysis Several machine learning techniques have been used in order to find the best mathematical model that links the protein structure QSDR models with the HCC disease in order to evaluate the relationship between new proteins with HCC Naive Bayes Logistic regression Logistic Radial Basis Function Network RBFNetwork Decision Table Naive Bayes hybrid classifier DTNB Support Vector Machines SVM and Multilayer Perceptron MLP The input of these methods is represented by the calculated topological indices of the protein spiral graphs such as the frequencies Shannon entropies and both Two strategies have been This journal is The Royal Society of Chemistry 2012 used one considering all the available TIs and the other one including a subset of the TIs after performing feature selection In recent years feature selection FS has become the focus of much research in areas of application for which a gr
186. he model has shown good Accuracy Specificity and Sensitivity values in both training series and external validation series The classification matrices for the training validation and both series are presented in Table 1 The PDIB zz and S LIBP values for all proteins used to train or validate cv the model are given in the ESI 2 available online or upon author s request This result confirms a statistically significant relationship between MARCH INSIDE parameters and LIBPs activity Taking into consideration that this classifier is a simpler linear equation with only nine input parameters we can conclude that this may become a very useful model LIBP Pred web server In the Internet era training and validation of a QSAR and or computational model should be considered the first step towards the development of a valuable tool for bioinformatics application in proteome research At the present time seeking a fast and accurate predictive model is not enough it should Table 1 Results of the 3D QSAR study of LIBPs with LDA Data Sub set Group Parameter nLIBPs LIBPs Training nLIBPs Specificity 90 0 675 75 LIBPs Sensitivity 87 4 76 525 Total Accuracy 88 8 Validation nLIBPs Specificity 91 6 229 21 LIBPs Sensitivity 88 0 24 176 Total Accuracy 90 0 Both training validation nLIBPs Specificity 90 4 904 96 LIBPs Sensitivity 87 5 100 701 Total Accuracy 89 1 856 Mol BioSyst 2012 8 851 862 View Online also be implemented
187. hem Parasitol 2005 144 1 1 9 17 Gunasekera AM Patankar S Schug J Eisen G Kissinger J Roos D et al Mol Biochem Parasitol 2004 136 1 35 42 18 Huestis R Fischer K Mol Biochem Parasitol 2001 118 2 187 99 19 Sharon I Davis JV Yona G Methods Mol Biol 2009 541 61 88 20 Liu L Cai Y Lu W Feng K Peng C Niu B Biochem Biophys Res Commun 2009 380 2 318 22 21 Skrabanek L Saini HK Bader GD Enright AJ Mol Biotechnol 2008 38 1 1 17 22 Najafabadi HS Salavati R Genome Biol 2008 9 5 R87 23 Kim S Shin SY Lee IH Kim SJ Sriram R Zhang BT Nucleic Acids Res 2008 36 Web Server issue W411 5 Jaeger S Gaudan S Leser U Rebholz Schuhmann D BMC Bioinformatics 2008 8 9 Suppl S2 25 Burger L van Nimwegen E Mol Syst Biol 2008 4 165 26 Scott MS Barton GJ BMC Bioinformatics 2007 8 239 27 Zvelebil MJ Tang L Cookson E Selkirk ME Thornton JM Mol Biochem Par asitol 1993 58 1 145 53 28 von Grotthuss M Plewczynski D Ginalski K Rychlewski L Shakhnovich EI BMC Bioinformatics 2006 7 53 29 Lappalainen I Thusberg J Shen B Vihinen M Proteins 2008 72 2 779 92 30 Shen B Bai J Vihinen M Protein Eng Des Sel 2008 21 1 37 44 31 Shen B Vihinen M Protein Eng Des Sel 2004 17 3 267 76 32 Liu ML Shen BW Nakaya S Pratt KP Fujikawa K Davie EW et al Blood 2000 96 3 979 87 33 Shen B Nolan JP Sklar LA Park MS Nucleic Acids Res 1997 25 16 3332 8 34 Chua HN Ning K Sung WK Leong
188. her hand we should note that the model determines different effects in sign and intensity over PABP action of different amino acids placed at different distances within different regions of the protein backbone Remember that parameter k accounts for the topo logical distance between the amino acids considered and R refers to the protein region Then we can conclude that according to our model fatty acid binding seems to be modu lated by region specific propagation of electrostatic interactions within the protein This effect should be correlated to the physico chemical mechanism of LIBP action However the explanation of this mechanism is a goal beyond the scope of this work which is oriented to the development of a LIBP predictor and not to unravel the mechanism of action of LIBPs Con sequently we have focused more on the statistical quality of the model The statistical parameters of the model are Cano nical Regression Coefficient R Chi square y and model significance level p level N represents only the number of proteins used to train the model We split the dataset at random in a training series 75 used for model construction and a prediction one 25 used for model validation The high A above 0 8 indicates a strong linear correlation between input and output The value of p level 0 05 for the Chi square test indicates a statistically significant discrimination between the two groups of proteins In addition t
189. hod to numerically characterize the structure of drugs 46 RNA 40 and proteins 41 47 48 as well as drug drug networks 49 drug protein interactions 50 PPIs and other systems such as an MCM associated to a graph In this regard MARCH INSIDE uses networks similar to other known in proteomics molecular biology and molecular microbiology where the nodes connected by links are atoms bonds amino acids electrostatic interactions proteins PPIs genes co expression organisms and microor ganisms parasite host interactions 51 58 In Fig 1 we depict the 3D structure and the Van der Waals surface for Thioredoxin PDB ID SYRC a pPPC present in P falciparum clone 3d7 A and the respective protein structure complex network graph for one of the proteins of the pPPC B At this structural level the nodes are amino acids and we link two nodes with an edge if the distance between them is lower than 15 this type of network is also known as contact map or protein residue networks 59 66 In a very recent review we have discussed the details and many 266 Y Rodriguez Soca et al Polymer 51 2010 264 273 Fig 1 3D structure and Van der Waals surface for a P falciparum protein A and complex network B applications of the MARCH INSIDE method to Molecular Microbi ology 67 The last upgrade of MARCH INSIDE carried out by Munteanu and Gonz lez D az was the implementation of the Internet portal Bio AIMS ht
190. hou http www csbio sjtu edu cn index eng htm propone tres servidores online para la predicci n de la ubicaci n de las prote nas en los virus bacterias gram negativas y gram positivas gt C fi O wwwcsbiositueduen Virus mPLoc Predicting the subcellular localization of viral proteins within host and virus infected cells Read Me Data Citation Input the Viral protein sequence in Fasta format Example Reference Hong Bin Shen and Kuo Chen Chou Virus mPLoc a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites Journal of Biomolecular Structure amp Dynamics 2010 28 175 86 Hong Bin Shen and Kuo Chen Chou Virus PLoc A fusion classifier for predicting the subcellular localization of viral proteins within host and virus infected cells Biopolymers 2007 85 233 240 Virus PLoc server has been updated to Virus mPLoc For original Virus PLoc access http l www csbio sjtu edu cn bioinf virus Contact Hong Bin Figura 9 El servidor Virus mPLoc para predecir la ubicaci n subcelular de las prote nas virales El primer servidor es Virus mPLoc 82 y sirve para predecir la ubicaci n subcelular de las prote nas virales utilizando la informaci n de diversos sitios Web Figura 9 El conocimiento de la ubicaci n subcelular de las prote nas virales en una c lula hu sped o en las c lulas infectadas por un virus es muy importante porque est relacionado c
191. i Measure Chi Square and G Square Like in LDA we always set a prior probability of p pPPI p npPPI 0 5 unless we specify a different value Last we used a FACT style direct stopping rule with a value of 0 01 to control the length of the CT All the CTs have been trained with the software STATISTICA 6 0 for which our laboratory holds rights of use 76 2 4 Dataset The protein structures were downloaded from PDB 82 using the following schemes for PDB database search i introducing the name of the parasite species Plasmodium as input parameter in the search item called source organism for positive cases or ii introducing the PDB IDs for all the proteins contained in the list reported in the article of Dobson and Doig 83 The positive cases pPPI are those protein protein pairs that make up a stable complex that has been structurally characterized 3D structure in Plasmodium species Plasmodium sp The list of negative cases npPPI search scheme b contain enzymes and other proteins present in humans and many other organisms including other parasites that are not present in Plasmodium sp The dataset con sisted of 5257 pairs of proteins 774 pPPIs and 4483 npPPIs from more than 20 organisms including parasites and human or cattle hosts Altogether 581 pPPIs and 3395 npPPIs were used in training and 193 pPPIs and 1088 npPPIs were used in validation Detailed information about the PDB ID the values of the electrostati
192. ication model that can predict if a protein is HCC related was created with the Naive Bayes method based only on 11 Shannon entropies of the spiral graph The Naive Bayes classifier estimates the probability conditioned to the class assuming that the attributes are conditionally independent given a class Y This assumption can be described as follows P sh Y HCC T P shilY HCC 3 where each set of attributes Sh Shl Sh2 Sh3 Shd contains d attributes Instead of computing the probability conditioned to a class for each combination of Sh it is only necessary to estimate the conditioned probability of each Sh given an output Y Mol BioSyst 2012 8 1716 1722 1719 Table 1 Classification scores and AUROCs for test data Fr Sh Both Method Accuracy 96 AUROC Accuracy AUROC Accuracy 96 AUROC Naive Bayes 88 99 0 89 90 92 0 91 89 80 0 90 Logistic 82 40 0 86 83 41 0 87 86 95 0 89 RBFNetwork 88 99 0 88 89 29 0 90 88 92 0 90 DTNB 85 10 0 88 85 74 0 87 84 29 0 88 SVM 85 85 0 89 86 03 0 89 86 89 0 90 MLP 86 77 0 88 87 07 0 87 86 29 0 89 This approach does not require a large set for training in order to obtain a good estimation of the probability To classify each test sample the Naive Bayes classifier calculates the posterior probability of each class Y P HCC sh P HCC T P sh HCC P HCC Since P Sh is the same for each output Y HCC selecting the class that maximizes the numerator is e
193. ies estudiadas fueron Candida albicans 43 compuestos analizados el 100 de la previsibilidad LSO A 3 49 Candida parapsilosis 23 100 A 0 86 Aspergillus fumigatus 21 95 2096 A 0 0596 Microsporum canis 12 17 91 60 A 2 84 Trichophyton mentagrophytes 11 100 A 0 51 Cryptococcus neoformans 10 90 A 0 90 La ecuaci n del modelo es la siguiente Actv 2 88 C X 1 26 C X 1 01 C T 0 78 C C 0 94 C X 0 76 C T zy 20 53 F 6484 71 93 p 0 001 8 donde es la estad stica de Wilk la estad stica de la discriminaci n total F es la relaci n de m k 7 Fisher y p es el nivel de error En esta ecuaci n C se calcula para la totalidad T de tomos en la mol cula o para asociaciones espec ficas de tomos Estas asociaciones son tomos con una caracter stica com n X hal genos y Cuns tomos de carbono insaturados Gonz lez D az y Prado Prado 80 han seleccionado pares de medicamentos antif ngicos con perfil de similares diferentes especies para predecir la actividad y las representaron como una gran red A continuaci n desarrollaron un modelo de clasificaci n mt QSAR en el que los resultados fueron las entradas de esta red La precisi n general de la clasificaci n del modelo fue del 87 0 161 de los 185 compuestos en entrenamiento del 83 4 50 de los 61 en validaci n y del 83 7 para 288 compuestos antif ngicos adicionales utilizados p
194. ighest S LIBP values predicted for all proteins studied with unknown function that are expressed in parasites correspond to one protein of C parvus see Table 2 The PDB IDs and score for this protein are PDB ID 2010A 2010 chain A and S LBP 85 63 This is a very high value according to our web server that may support a more serious inspection of this protein as probable LIBP 201O is a complex protein Table 2 Top hits of LIBPs predicted in A sapiens Parasites and other organisms Species Organism PDB ID S Top LIBP Pred hits for different organisms Shigella flexneri 2RJBA 93 87 Thermus thermophilus IWDTA 92 03 Arthrobacter aurescens 3IUKA 91 01 Neisseria meningitidis IVGYA 88 96 Thermus thermophilus IWDIA 88 73 Shewanella oneidensis IZEEA 87 28 Haemophilus influenzae 3M73A 87 2 Arabidopsis thaliana IYDUA 85 72 Cryptosporidium parvum 2010A 85 63 Chlamydophila abortus 3CE2A 85 01 Methanocaldococcus jannaschii 2AEUA 79 24 Staphylococcus aureus IQYIA 78 99 Oleispira antarctica 3IRUA 78 96 Aquifex aeolicus 2HEKB 78 95 Plasmodium sp vivax 2GUUA 65 9 berghei 2FDSA 65 71 falciparum 2QU8A 58 77 falciparum 1Z40A 55 17 falciparum 1Z40E 54 63 vivax 2B30A 52 98 falciparum 1XQ9A 51 33 knowlesi ITXJA 49 94 falciparum 1Y6ZA 49 79 falciparum IN81A 49 78 falciparum 2FBNA 47 09 falciparum 3NI8A 45 99 vivax 2FO3A 45 19 falciparum ITQXA 45 08 falciparum 2P65A 44 78 falciparum 3D7 2H2YA 40 78 falciparum 2VWAA 38 07 falciparum ISYRA
195. igned a QSAR model for alignment free prediction of HBC biomarkers based on electrostatic potentials of protein pseudofolding HP lattice networks Prediction models for HCC using two different types of protein graphs were previously published a HP lattice type and a star graph type The current work proposes an improved cancer non cancer classification model for HCC based on protein square Randic spiral graph TIs 6 obtained from protein primary sequences and Naive Bayes classifiers Similar studies based on the spiral graph have been published QSDR models for prostate cancer using mass spectra input data Quantitative Proteome Property Relationships QPPRs for finding biomarkers of organic drugs using blood mass spectra or chemical research in toxicology Naive Bayes classifiers have been recently used for different problems such as the protein quaternary structure for protein subcellular location classification of DNA repair genes into ageing related or non ageing related genomic data integration to reduce the misclassification rate in predicting protein protein interactions prediction of human protein protein interaction to explore underlying cancer related pathway crosstalk prediction of Alzheimer s disease from genome wide data or virtual screening and chemical biology Materials and methods The description of the methodology followed in this work is presented in Fig 1 The input data are repres
196. in Shen Large scale predictions of Gram negative bacterial protein subcellular locations Journal of Proteome Research 2006 5 3420 8 Gneg PLoc server has been updated to 2 0 version for the 1 0 version access http www csbio sjtu edu cn bioinf Gneg Contact Hong Bin Figura 10 El servidor Gneg mPLoc para predecir la ubicaci n de las prote nas en bacterias gram negativas El tercer servidor Gpos mPLoc 84 es similar al Gneg mPLoc sirve para predecir la ubicaci n de las prote nas en bacterias gram positivas y esta implementado en http www csbio sjtu edu cn bioinf Gpos multi Otro ejemplo de servidor para los virus es HIVcleave 85 una herramienta para predecir los sitios de cleavage de las proteasas del HIV virus de inmunodeficiencia humana en prote nas Seg n la teor a de la clave distorsionada 86 la informaci n de los sitios de escisi n cleavage de las prote nas por la proteasa del HIV es muy util para encontrar inhibidores eficaces contra el HIV la causa del SIDA s ndrome de inmunodeficiencia adquirida Para satisfacer la creciente necesidad en este sentido se ha implementado este servidor web en http chou med harvard edu bioinf HIV Figura 11 Se ofrece tambi n una 22 gu a online paso a paso sobre c mo utilizar HIVcleave para identificar los sitios de corte para una consulta de secuencias de prote nas por las proteasas del HIV 1 y del HIV 2 Q HiVcleave Predictin
197. ira L Santana and E De Clercq J Med Chem 2000 43 1975 1985 118 D Ramel F Lagarrigue V Pons J Mounier S Dupuis Coronas G Chicanne P J Sansonetti F Gaits Iacovoni H Tronchere and B Payrastre Sci Signaling 2011 4 ra61 119 E Mikiciuk Olasik E Zurek R Mikolajezak E Zakrzewska and K Blaszczak Swiatkiewicz Nucl Med Rev Cent East Eur 2000 3 149 152 120 J D Artz J E Dunford M J Arrowood A Dong M Chruszcz K L Kavanagh W Minor R G Russell 862 Mol BioSyst 2012 8 851 862 View Online F H Ebetino U Oppermann and R Hui Chem Biol 2008 15 1296 1306 121 A A Reszka and G A Rodan Mini Rev Med Chem 2004 4 711 719 122 A A Reszka and G A Rodan Curr Rheumatol Rep 2003 5 65 74 123 G A Rodan and A A Reszka Curr Mol Med 2002 2 571 577 124 F M Jordao A Y Saito D C Miguel V de Jesus Peres E A Kimura and A M Katzin Antimicrob Agents Chemother 2011 55 2026 2031 125 X Dai X Gu M Luo and X Zheng Protein Pept Lett 2006 13 955 957 126 K C Chou and H B Shen Nat Sci 2009 2 63 92 This journal is The Royal Society of Chemistry 2012
198. is gnomon are zero This option is more useful when the sequences have few classes and they are not very big otherwise a too high number of indices would be obtained and therefore it will complicate further statistical process By classes in the global graph in this option the TIs are calculated for each one of the classes but in the whole graph In other words the TIs of a given class in the whole graph are the sum of their values in all the gnomons This option reduces the number of TIs in the case of very big sequences thus being a good option in such cases By gnomons if this option is selected the TIs are calculated at gnomons level and independently of the classes In other words the indices for a certain gnomon are the sum of the TIs of all the classes in this gnomon This option can be very useful if the sequences have a great number of classes and a moderate size In the U graph built using CULSPIN each node belongs to a certain class and the nodes are not only connected following the sequence of letters but rather also those nodes that belong to the same class they have the same letter are connected So in our U graph each node will be connected to one or more nodes By definition it is known as node degrees the number of nodes to which the node in question is connected to and as total degrees of a graph the sum of the degrees of all the nodes that form the graph Therefore we can define gnomon degrees as the sum of t
199. iscovered in genome proteome of Plasmodium sp determines a higher number of possible pPPC non pPPC structures derived from different PPIs in parasite and human hosts which makes difficult the exhaustive experimental investigation in terms of time and resources 11 12 In fact many researchers in the field of Molecular and Biochemical Parasitology have recognized the high importance of different computational tools statistical models servers databases to study the proteome and or genome of P falciparum and P vivax 13 18 This fact determines that the development of predictive models for pPPIs non PPIs discrimina tion becomes a very useful tool aimed at discovering new drug targets There are many theoretical methods for the prediction of PPIs in humans and other organisms Many of them are based on the same approaches used for the study of protein structure function relationships but extended to PPIs such as sequence alignment techniques phylogenic techniques or alignment free parameters besides other methods like molecular modeling incorporate knowledge about the 3D structure of the proteins involved in the PPIs These methods often make use of complex trees representations as input or output of the analysis to repre sent these interactions as PPIs trees Sequence only methods are often faster than 3D ones and need less structural information On the contrary 3D methods give a more clear idea on the structure of the protein and
200. itative Structure Activity Relationships QSAR for peptides binding to Human Amphiphysin 1 SH3 domain Curr Proteomics 2009 6 4 289 302 P rez Montoto LG Prado Prado F Ubeira FM Gonz lez D az H Study of Parasitic Infections Cancer and other Diseases with Mass Spectrometry and Quantitative Proteome Disease Relationships Curr Proteomics 2009 6 4 246 61 Torrens F Castellano G Topological Charge Transfer Indices From Small Molecules to Proteins Curr Proteomics 2009 6 4 204 13 V zquez JM Aguiar V Seoane JA Freire A Serantes JA Dorado J et al Star Graphs of Protein Sequences and Proteome Mass Spectra in Cancer Prediction Curr Proteomics 2009 6 4 275 88 70 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Chou KC Graphic rule for drug metabolism systems Curr Drug Metab 2010 11 4 369 78 Garcia I Diop YF Gomez G QSAR amp complex network study of the HMGR inhibitors structural diversity Curr Drug Metab 2010 11 4 307 14 Gonzalez Diaz H Network topological indices drug metabolism and distribution Curr Drug Metab 2010 11 4 283 4 Gonzalez Diaz H Duardo Sanchez A Ubeira FM Prado Prado F Perez Montoto LG Concu R et al Review of MARCH INSIDE amp complex networks prediction of drugs ADMET anti parasite activity metabolizing enzymes and cardiotoxicity proteome biomarkers Curr Drug Metab 2010 11
201. ity to have DNA cleavage action based on the best obtained spectral moment QSAR model from the training set descriptor values Thus the warning leverage h is defined by eq 8 h 3xp In 8 nis the number of training instances and p is the number of model adjusting parameters Figure 5 shows the applicability domain of the LDA model which is determined by training instances with h values lower than h 0 058 New samples with an h value higher than h and or a value of standardized residual higher than 2 or lower than 2 are out of the DA bandwidth of the model and consequently cannot be reliably predicted 8 89 Predicting ATCUN Proteins in Parasites The lack of infor mation about the ATCUN motifs in parasites leads to a necessity of testing the parasite proteins with the best resulted model to evaluate possible DNA cleavage proteins 121490 Figure 6 presents the number of the possible ATCUN like proteins in 9 parasite families with a probability greater than 99 A large number of protein chains in protozoa such as Trypanosoma Plasmodium Leishmania or Toxoplasma have been predicted to present DNA cleavage activity see Figure 1 The percentages of these highly predicted protein chains from the analyzed ones in all parasites arranged according to the most important biological function are the following 70 596 for oxidoreductases 62 596 for signaling proteins 58 2 for lyases 45 5 for membrane proteins 44 4 for lig
202. ivo final y para los grupos group file Nota esta funci n se puede emplear para transformar las secuencias codificadas en 3 letras tales como los codones para los amino cidos en secuencias de tipo S2SNet con amino cidos como un car cter 38 ndices de las redes de tipo estrella Sus datos se utilizar n para calcular los siguientes ndices Y Entrop a de Shannon de las n Matrices de Markov Sh Sh p log p 11 pi son los elementos ni del vector p resultado desde la multiplicaci n vectorial entre la matriz Markov normalizada ni x nj elevada al poder y el vector nj x 1 con cada elemento igual a 1 n Y Traces de las matrices de conectividad Tr Tr 2 M5 12 n 0 poder M matriz conectividad dimensi n i i ii i simo elemento diagonal Y N mero de Harary H mij wr W H digg W 13 di elementos de la matriz de distancia mi elementos de la matriz de conectividad M w los pesos nw es 1 para la selecci n de los pesos y 0 al contrario Y Index de Wiener W nw 14 v ndice Topol gico de Gutman S6 ij 15 deg elementos de la matriz de los grados v ndice Topol gico de Schultz S S Y deg deg dij w7 16 v ndice de autocorrelaci n de la estructura topol gica de Moreau Broto ATSn n 1 poder s lo si se incluyen los pesos ATS Yi dpi w w 17 dp elementos de la matriz de distancias entre pares de nodos
203. jackknife test has been increasingly recognized and widely adopted by investigators to test the power of various prediction methods see e g ref 79 87 However to reduce the computational time 10 fold cross validation has been used to verify the accuracy of the models Hence the original dataset is parti tioned into 10 subsets Of the 10 subsets a single subset is retained as the validation data for testing the model and This journal is The Royal Society of Chemistry 2012 the remaining are used as training data The cross validation process is then repeated 10 times with each of the 10 subsets used exactly once as the validation data Thus classification accuracy percentages were calculated for the test group with the corresponding AUROCs AUROC Area under Receiver Operating Characteristic represents the goodness of a predictor in a binary classification task and its values close to 1 show that the model has an excellent classification capacity Statistics In the case of the best classification model additional statistical studies have been presented For this model we calculated the sensitivity Se specificity Sp positive predictive value PPV and negative predictive value NPV for each cut off point to evaluate the diagnostic accuracy We also calculated the diagnostic odds ratio DOR which expresses the strength of the association between test result and disease it is the ratio of the odds of a positive re
204. k representation of the molecular systems including but not limited to protein structure as in this case In fact there are many types of graph representations but essentially they contain two elements 1 the nodes which are the parts of the system represented by a dot atoms amino acids nucleotides codons genes proteins metabolites etc and 2 the links between these parts represented as edges or arcs chemical bonds hydrogen bonds metabolic reactions co expression regulation and other ties or relationships In any case with the generalization of Internet the develop ment of new predictive methods has become the first step in the application of computational techniques to proteome research Nowadays it is not sufficient to develop a fast and accurate This journal is The Royal Society of Chemistry 2012 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A predictive model we should also implement it in public servers preferably of free access for the use of the scientific community The server packages developed by Chou and Shen to predict the function of proteins from structural parameters or explore protein structures are good examples in this sense These may be used by proteome research scientists through inter acting with user friendly interfaces It means that the user does not need to be an expert on the theoretical
205. kbone in structure property relationship studies The parameters used x R to represent the average electrostatic potential due to the interactions between all pairs of amino acids aa The chosen amino acids are those with the elec trostatic charges q and qj that are allocated inside a specific protein region R and placed one from each other at a distance dj equal to or shorter than k times the cutoff distance see details in previous works 8 In this work we want to use amp R values of two proteins amp R for protein 1 and amp CR for protein 2 to generate structural parameters describing PPI between these proteins To this end we introduce here for the first time a new type of PPI invariants in the sense that they do not depend on the interchange between proteins in such a way that we do not need to label and distinguish them for calculation We introduce with this objective three types of invariants PPI electrostatic average invariant x R PPI elec trostatic absolute difference invariant and PPI electrostatic product invariant E R R ECR amp CR 1 E CR R IER amp CR I 2 PE R R ER ECR 3 Notably to guarantee that these parameters are invariants to protein labeling as 1 or 2 we have to use always the same 1R R Rand k k k values To calculate the amp R values for each protein the method uses as a source of protein macromolecular descriptors the stochasti
206. l para el entrenamiento fue del 98 6396 La validaci n del modelo se llev a cabo mediante series de predicci n externas clasificando 6216 de los 6277 compuestos inactivos y 215 de los 239 compuestos activos La predictibilidad total en el entrenamiento fue del 98 7 La ecuaci n del modelo es la siguiente Actv 3 442 u Het 3 18 u H Het 3 85 14 Ca 44 76 u C 4 615 u Ca 28 26 u T 29 26 2 m E 4 0 33 14367 94 p 0 001 10 sat TE donde x2 es el Chi cuadrado y p el nivel de error En esta ecuaci n us se calcularon para el total T de tomos en la mol cula o para asociaciones espec ficas de tomos Estas asociaciones son tomos con una caracter stica com n Het hetero tomo H Het hidr geno unido a hetero tomos Csat tomos de carbono saturados 19 1 3 Herramientas online de clasificaci n molecular En la secci n anterior hemos presentado modelos QSAR para compuestos anti virales anti bacterianos anti par sitos y anti f ngicos Estos modelos no est n implementados en servidores Web como la mayor a de los modelos QSAR en la literatura En la secci n actual presentamos algunos ejemplos de p ginas Web con modelos tipo QSAR con aplicaciones en Microbiolog a y Parasitolog a La localizaci n de las prote nas en virus y bacterias es muy importante para el desarrollo de f rmacos nuevos y en la b squeda de dianas moleculares Por ello el grupo de Kuo Chen C
207. la resistencia a los medicamentos tanto en la malaria como en el c ncer humano Sin embargo no hay modelos generales para predecir los pPPCs utilizando los ndices de la estructura del biopol mero PPC Por lo tanto en este trabajo presentamos nuevos descriptores num ricos de la cadena de Markov para las interacciones prote na prote na PPIs basados en la entrop a electrost tica y se calculan estos par metros para 5257 pares de prote nas 774 pPPCs y 4483 no pPPCs de m s de 20 organismos incluyendo par sitos y hu spedes humanos Se encontr un rbol de clasificaci n simple con una alta precisi n sensibilidad y especificidad 90 2 98 5 tanto en el entrenamiento como en la validaci n y se implement en el servidor PlasmodPPI facil de usar disponible de forma gratuita en http bio aims udc es PlasmodPPI php Figura 31 Un exjemplo de resultado para los pares entre los listas de cadenas proteicas 3C5IA 2F6IE ISYRC y 3CSIE 2GHUA ISYRF se presenta en la Figura 32 Ibero NBIC Network Cor acult PlasmodPPI calc Bio AIMS Modelling the reality Home Links About Process ID 20939517435537c0de PDB List 1 3CSIA 2F6IE 1SYRC PDB List 2 3CSIE 2GHUA 1SYRF please wait PDB Update Verification List 1 3CSIA 2F6IE 1SYRC PDB Update Verification List 2 3CSIE 2GHUA 1SYRF Processing PDB chain List 1 3CSIA 2F6IE 1SYRC Processing PDB chain List 2 3CSIE 2GHUA 1SYRF Res
208. lar location prediction by incorporating multiple sites J Biomol Struct Dyn 2010 28 2 175 86 Shen HB Chou KC Gneg mPLoc a top down strategy to enhance the quality of predicting subcellular localization of Gram negative bacterial proteins J Theor Biol 2010 264 2 326 33 Shen HB Chou KC Gpos mPLoc a top down approach to improve the quality of predicting subcellular localization of Gram positive bacterial proteins Protein Pept Lett 2009 16 12 1478 84 Shen HB Chou KC HIVcleave a web server for predicting HIV protease cleavage sites in proteins Anal Biochem 2008 375 388 90 Chou KC Prediction of HIV protease cleavage sites in proteins Anal Biochem 1996 233 1 14 74 5 PUBLICACIONES ANEXOS A continuaci n se presenta un ANEXO con las publicaciones que se recogen en la Tesis siguiendo el orden establecido en la misma Journal of Theoretical Biology 317 2013 331 337 ELSEVIER Journal of Theoretical Biology journal homepage www elsevier com locate yjtbi Contents lists available at SciVerse ScienceDirect Random Forest classification based on star graph topological indices for antioxidant proteins Enrique Fern ndez Blanco Vanessa Aguiar Pulido Cristian Robert Munteanu Julian Dorado University of A Coru a ICT Dept Facultad de Inform tica Campus de Elvi a s n 15071 A Coru a Spain HIGHLIGHTS This work presents an automatic antioxidant protein detection method
209. le orbit by 50 x r lt 75 and the outer orbit by r 75 Thus five sets or orbits of amino acids core inner middle surface total and six ranges for the electrostatic interactions 0 1 2 3 4 5 were considered for the calculation of a total of thirty 5 x 6 30 spectral moments 7172 with BIOMARKS tool to charac terize each of the 415 proteins Our formalism is the metal free model All the analyzed proteins have a metal in the PDB file but we used for our calculations only the protein geometry Thus the current QSAR model may predict that a new protein has ATCUN DNA cleavage activity only if the protein can bind a metal Statistical Analysis The methodology flowchart from Figure l gives details about each step of the present work The 3D electrostatic moments of all the database proteins obtained by using the PDB files and BIOMARKS tool are the base of the next step the design of a classification model by statistical analysis Linear discriminant analysis LDA has been chosen as the simplest and fastest method To decide if a protein is classified as having ATCUN activity or not we added a variable named ATCUNactiv with values of 1 for active or 1 for inactive and a cross validation variable Sel The independent data test is used by splitting the data at random in a training series train 7590 used for model construction and a prediction one val 25 for model validation The ATCUN activity of these proteins has
210. le o activa s lo cuando las secuencias mostradas al abrir el fichero han requerido cierta transformaci n es decir cuando los datos estaban organizados por columnas eran n meros estaban en formato FASTA etc Export graph exportar a ficheros independientes de tipo CT o NET la conectividad de cada uno de los Grafos U construidos con el objetivo de poder utilizarlos en otros programas para someterlos a otros c lculos Save Indices guardar en ficheros TXT o CSV los ndices calculados por la aplicaci n para su posterior estudio estad stico 45 e Quit salir de la aplicaci n Ment Submit e Build Spiral colocar las secuencias seleccionadas en la representacion de espiral y construir el Grafo U conectando los nodos que pertenecen a la misma clase los que tienen la misma letra e Calculate Indices calcular los TIs de las secuencias seleccionadas a partir de sus respectivos Grafos U Una vez terminada esta operaci n los resultados se muestran en una nueva p gina Ment View e View a graph graficar y visualizar en una ventana independiente el Grafo U de una secuencia seleccionada una secuencia a la vez S lo est activa despu s de haber construido al menos un Grafo U Men Help e Help muestra en una ventana independiente el contenido la ayuda e About muestra la cl sica ventana con informaci n acerca de la aplicaci n En un inicio CULSPIN presenta una sola p gina con el t tulo Options en su ventana pri
211. llular location by fusing optimized evidence theoretic K nearest neigh bor classifiers J Proteome Res 2006 5 1888 97 Chou K C Shen H B Large scale predictions of Gram negative bacterial protein subcellular locations J Proteome Res 2006 5 3420 8 Santana L Uriarte E Gonz lez D az H Zagotto G Soto Otero R Mendez Alvarez E A QSAR model for in silico screening of MAO A inhibitors Prediction synthesis and biological assay of novel coumarins J Med Chem 2006 49 3 1149 56 Gonz lez D az H de Armas R R Molina R Markovian negent ropies in bioinformatics 1 A picture of footprints after the interaction of the HIV 1 Psi RNA packaging region with drugs Bioinformatics 2003 19 16 2079 87 Aguero Chapin G Varona Santos J de la Riva G A Antunes A Gonzalez Villa T Uriarte E Gonzalez Diaz H Alignment Free Prediction of Polygalacturonases with Pseudofolding Topo logical Indices Experimental Isolation from Coffea arabica and Prediction of a New Sequence J Proteome Res 2009 8 4 2122 28 Gonz lez D az H Saiz Urra L Molina R Santana L Uriarte E A Model for the Recognition of Protein Kinases Based on the Entropy of 3D van der Waals Interactions J Proteome Res 2007 6 2 904 08 Concu R Dea Ayuela M A Perez Montoto L G Bolas Fernandez F Prado Prado F J Podda G Uriarte E Ubeira F M Gonzalez Diaz H Prediction of Enzyme Classe
212. ly distributed cause of malaria in people with up to 2 5 billion people at risk and an estimated 80 million to 300 million clinical cases every year including severe Corresponding author Tel 34 981 563100 fax 34 981 594912 E mail addresses humberto gonzalez usc es gonzalezdiazh yahoo es H Gonz lez D az 0032 3861 see front matter 2009 Elsevier Ltd All rights reserved doi 10 1016 j polymer 2009 11 029 by P falciparum in Sub Saharan Africa Both technological advances enabling the sequencing of the P vivax genome and a recent call for worldwide malaria eradication have placed a new emphasis on the importance of addressing P vivax as a major public health problem However because of this parasite s biology it is especially difficult to interrupt the transmission of P vivax and experts agree that the available methods for preventing and treating both infections with P vivax and P falciparum are inadequate 2 Malaria perhaps one of the most serious and widespread diseases encountered by mankind continues to be a major threat to about 40 of the world s population especially in the developing world As malaria vaccines remain problematic chemotherapy still is the most important weapon in the fight against the disease However almost all available drugs have been compromised by the highly adaptable parasite and the increasing drug resistance of P falciparum continues to be the main problem Therefore the li
213. m s generales Por ejemplo los nuevos modelos pueden aplicarse para predecir la funci n de una prote na con una secuencia o una estructura determinada en 3D la funci n de una estructura secundaria del ARN las interacciones de los f rmacos espec ficos con m ltiples dianas como prote nas presentes en el proteoma de un organismo varios organismos infecciosos parasitarios 1 2 En este sentido se han publicado diferentes trabajos para discutir tanto las aplicaciones cl sicas del QSAR como tambi n otras nuevas en distintas reas revistas Current Topics in Medicinal Chemistry 2 11 Current Proteomics 12 19 Current Drug Metabolism 20 28 Current Pharmaceutical Design 29 38 and Current Bioinformatics 39 48 En todos estos trabajos de revisi n se puede observar que la teor a de grafos y redes complejas se est expandiendo a diferentes niveles de organizaci n de la materia tales como las redes del genoma las redes de interacci n prote na prote na redes hu sped par sito redes ling sticas redes sociales 49 54 redes electro energ ticas e Internet 55 Una red es un conjunto de elementos generalmente llamados nodos con conexiones entre ellos aristas Los nodos pueden ser tomos mol culas prote nas cidos nucleicos f rmacos c lulas organismos par sitos personas leyes ordenadores o cualquier otro componente de un sistema real Las aristas son las relaciones entre los nodos como los enlaces qu mi
214. mary of ANN Analysis Results for Some Models ANN profile set parameter value 96 group TPPI non TPPI LNN 2 2 1 1 Train Sensitivity 88 2 TPPI 677 91 Specificity 89 7 non TPPI 526 4578 Accuracy 89 5 Test Sensitivity 91 4 TPPI 233 22 Specificity 90 9 non TPPI 159 1580 Accuracy 90 9 LNN 3 3 1 1 Train Sensitivity 88 3 TPPI 678 90 Specificity 89 1 non TPPI 554 4550 Accuracy 89 0 Test Sensitivity 91 8 TPPI 234 21 Specificity 90 5 non TPPI 165 1574 Accuracy 90 7 PNN 3 3 5872 2 2 1 Train Sensitivity 0 0 TPPI 0 768 Specificity 100 0 non TPPI 0 5104 Accuracy 86 9 Test Sensitivity 0 0 TPPI 0 255 Specificity 100 0 non TPPI 0 1739 Accuracy 87 2 MLP 1 1 6 5 1 1 Train Sensitivity 88 4 TPPI 679 89 Specificity 88 9 non TPPI 564 4540 Accuracy 88 9 Test Sensitivity 91 8 TPPI 234 21 Specificity 90 3 non TPPI 168 1571 Accuracy 90 5 MLP 1 1 4 1 1 Train Sensitivity 88 3 TPPI 678 90 Specificity 88 9 non TPPI 567 4537 Accuracy 88 8 Test Sensitivity 91 8 TPPI 234 21 Specificity 90 5 non TPPI 166 1573 Accuracy 90 6 MLP 1 1 6 1 1 Train Sensitivity 88 4 TPPI 679 89 Specificity 88 9 non TPPI 564 4540 Accuracy 88 9 Test Sensitivity 91 8 TPPI 234 21 Specificity 90 3 non TPPI 168 1571 Accuracy 90 5 RBF 1 1 1 1 1 Train Sensitivity 11 7 TPPI 90 678 Specificity 11 3 non TPPI 4528 576 Accuracy 11 3 Test Sensitivity 8 2 TPPI 21 234 Specificity 9 8 non TPPI 1569 170 Accuracy 9 6 function RBF The parameter Ny is the number of input variables Nin is the number of i
215. may be used to predict proteins with known spatial structure but unknown function 19 27 The importance of these latter methods is that these functionally non annotated structures become common in the Protein Data Bank PDB with the devel opment of powerful characterization techniques 28 Another role of the computational methods is the possibility to study not only the wild type proteins but also the computational analysis of mutations 29 33 Specifically in this work we are interested in computational methods to predict pPPls that determine the formation of non covalent but physically stable PPCs between two proteins that can be isolated and the 3D structure chemically characterized as a potential drug target Protein complexes are fundamental for understanding principles of cellular organizations As the sizes of PPI trees are increasing accurate and fast protein complex prediction from these PPI trees can be useful as a guide for biological experiments to discover novel protein complexes 34 Otherwise it is the direct prediction of complexes by protein protein docking but it may become computationally expensive if we aim at performing the screening of large databases 35 It is also of major importance to recall that nowadays it is not enough to develop a predictive model we should also implement it into public servers preferably of free access for the use of the scientific community The server packages developed by Chou and Shen
216. mbi n contiene 207 compuestos conocidos que no son tan recientes como los anteriores Estos compuestos han sido presentados en el ndice de Merck con otras actividades que no incluyen la acci n antiviral contra cualquier especie de virus y han sido utilizados como compuestos inactivos El An lisis Discriminante Lineal LDA se ha empleado para clasificar todos estos medicamentos en dos clases de compuestos activos o inactivos contra las diferentes especies virales analizadas El modelo clasific correctamente 5129 de los 5594 compuestos 12 inactivos sensibilidad 91 69 y 412 de los 422 compuestos activos especificidad 97 63 La ecuaci n del modelo es la siguiente Actv 0 95 u H Het 1 50 u H Het 3 23 y Cun 4 02 u Ca 20 47 u T 10 34 u T 0 74 u X 8 88 2 0 51 4 2402483 p 0 001 1 donde A es la estad stica de Wilk y2 chi cuadrado y p el nivel de error En la ecuaci n us es el momento espectral de una cierta especie despu s de k etapas Se ha calculado para el total T de los tomos en la mol cula o para asociaciones espec ficas de tomos Estas asociaciones son tomos con una caracter stica com n H Het hidr geno unido a hetero tomos Cuns tomos de carbono insaturados Csat tomos de carbono saturados X tomos de hal geno Prado Prado et al 73 han utilizado el LDA para ajustar un modelo mt QSAR que ha clasificado 600 medicamentos como activos
217. mited clinical repertoire of effective drugs and the emergence of multi resistant strains substantiate the need for new proteins or the discovery of Y Rodriguez Soca et al Polymer 51 2010 264 273 265 new functions for known proteins that may become targets of new anti malarial compounds or the discovery of proteins involved in multi drug resistance 3 8 It is thus imperative that the devel opment of new methods and strategies becomes a priority 2 In this regard stable protein protein complexes formed by Protein Protein Interactions PPIs may become interesting targets for new drugs and other treatment methods or strategies For instance there are high molecular weight rhoptry proteins of P falciparum in a multi protein complex consisting of proteins of 140 130 and 110 kDa The complex of rhoptry proteins binds to human and mouse erythrocyte membranes in association with a 120 kDa SERA protein These proteins are believed to participate in the process of erythrocyte invasion Sam Yellowed have used six different antibodies polyclonal and monoclonal known to precipitate the high molecular weight rhoptry protein complex to analyze the structural relationship of proteins within the complex The results provided insights concerning the mechanism of protein protein interaction within the complex 9 These types of results indicate that physically stable protein protein biopolymer complexes pPPC made up of unique PPIs of Plasmo
218. mmunity for free of charge use Acknowledgment We sincerely thank the kind attention and valuable comments received from both the editor Prof Martin W McIntosh and the unknown referee H G D and C R M acknowledge research contract spon sored by Xunta de Galicia grant Isidro Parga Pondal Program We also thank partial financial support from the General Directorate of Scientific and Technologic Promotion of the Galician University System Xunta de Galicia grants 1188 Journal of Proteome Research e Vol 9 No 2 2010 05 06 07 08 09 10 1 1 Sensitivity 2007 127 and 2007 144 and Carlos III Health Institute grants PIO52048 and RD07 0067 0005 Supporting Information Available Detailed informa tion about the PDB ID the values of the electrostatic potential indices the corresponding observed classification and the predicted classification for each TPPI or non TPPI pair This material is available free of charge via the Internet at http pubs acs org References 1 Naula C Parsons M Mottram J C Protein kinases as drug targets in trypanosomes and Leishmania Biochim Biophys Acta 2005 1754 1 2 151 9 Cribb P Serra E One and two hybrid analysis of the interactions between components of the Trypanosoma cruzi spliced leader RNA gene promoter binding complex Int J Parasitol 2009 39 5 525 32 Juri Ayub M Smulski C R Nyambega B Bercovich N Masiga D Vazquez M P Aguilar C
219. mplex Cancer Res 1983 43 2 824 8 Jin Y Lewis M A Gokhale N H Long E C Cowan J A Influence of stereochemistry and redox potentials on the single and double strand DNA cleavage efficiency of Cu II and Ni II Lys Gly His derived ATCUN metallopeptides J Am Chem Soc 2007 129 26 8353 61 Harford C Sarkar B Neuromedin C binds Cu II and Ni II via the ATCUN motif implications for the CNS and cancer growth Biochem Biophys Res Commun 1995 209 3 877 82 Drew S C Noble C J Masters C L Hanson G R Barnham K J Pleomorphic copper coordination by Alzheimer s disease amyloid beta peptide J Am Chem Soc 2009 131 3 1195 207 Yorita H Otomo K Hiramatsu H Toyama A Miura T Takeuchi H Evidence for the cation pi interaction between Cu and tryptophan J Am Chem Soc 2008 130 46 15266 7 Dias A V Mulvihill C M Leach M R Pickering I J George G N Zamble D B Structural and biological analysis of the metal sites of Escherichia coli hydrogenase accessory protein HypB Biochemistry 2008 47 46 11981 91 Chung K C Cao L Dias A V Pickering I J George G N Zamble D B A high affinity metal binding peptide from Escheri chia coli HypB J Am Chem Soc 2008 130 43 14056 7 Jin Y Cowan J A Targeted cleavage of HIV rev response element RNA by metallopeptide complexes J Am Chem Soc 2006 128 2 410 1 Mal T K Ikura
220. n of the electrostatic barrier for proton transport in aquaporin FEBS Lett 2004 570 1 3 41 6 Norberg J Nilsson L On the truncation of long range electrostatic interactions in DNA Biophys J 2000 79 3 1537 53 Navarro E Fenude E Celda B Conformational and structural analysis of the equilibrium between single and double strand beta helix of a D L alternating oligonorleucine Biopolymers 2004 73 2 229 41 Costa L A Rocha W R De Almeida W B Dos Santos H F Linear free energy relationship for 4 substituted o phenylenedi amine platinum ID dichloride derivatives using quantum mechan ical descriptors J Inorg Biochem 2005 99 2 575 83 Perez Gonzalez M Morales Helguera A TOPS MODE versus DRAGON descriptors to predict permeability coefficients through low density polyethylene J Comput Aided Mol Des 2003 17 10 665 72 Marrero Ponce Y Medina Marrero R Torrens F Martinez Y Romero Zaldivar V Castro E A Atom atom type and total nonstochastic and stochastic quadratic fingerprints a promising approach for modeling of antibacterial activity Bioorg Med Chem 2005 13 8 2881 99 Marrero Ponce Y Montero Torres A Zaldivar C R Veitia M I Perez M M Sanchez R N Non stochastic and stochastic linear indices of the molecular pseudograph s atom adjacency matrix application to in silico studies for the rational discovery of new antimalarial compo
221. ncipal en forma de libreta de notas P gina Options En esta p gina hay cuatro reas bien definidas cuyas funciones se describen a continuaci n I Input file s format esta caja de controles permite seleccionar entre los tipos de formatos de ficheros de entrada aceptados por CULSPIN aquella opci n que se corresponda con el formato de nuestros datos A continuaci n mostramos un ejemplo de cada uno de los formatos para su mejor comprensi n 46 AGCITCCGAAGTCAGCAGCTT Letter Sequence Letter Sequence Figura 20 Tipos de entrada para el CULSPIN Descripci n de CULSPIN Figura 20 a Text file by rows en este formato las secuencias est n organizadas de forma tal que cada l nea del fichero TXT corresponda a un caso o secuencia diferente Secuencias de letras Cha 01 GDDGGGGGDGGGDGDDGGGDGGGDGDGGDGDDDDGGGGGDGGDDGGGGGGGGGGGGKKKKKAAAKKAKKKKAAK Cha 02 DDGGDGGGGGGGGDGGGDGDDDDDDGGGGGDGGDDGGGGGGGGGGGGGGGGKKKKKAAAKKAKKKKK Cha 03 GDGGDGGGGGGGGDGGGDGDDGGGDGGGDGDGGDGDDDDGGGGGDGGDDGGGGGGGGGGKKKKKAAAKKAKKKKKKAAA Secuencias num ricas Cha 01 7 86E 05 2 18E 07 9 60E 05 0 000366 0 000810 0 001428 0 002221 0 00318 0 004328 Cha 02 2 18E 07 9 60E 05 0 000366 0 000810 0 001428 0 002221 0 003187 0 00432 7 86E 05 Cha 03 9 60E 05 0 000366 0 000810 0 001428 0 002221 0 003187 0 004328 0 005643 b Text file by columns en este formato las secuencias est n organizadas de forma tal que cada columna en el fichero texto correspo
222. ncontrado un modelo lineal simple que predice m s del 90 de los TPPIs y no TPPIs tanto en el entrenamiento y como en el grupo de validaci n utilizando s lo dos par metros Los par metros son d s Ex s1 Ex s2 la diferencia absoluta entre los valores xs en la superficie de las dos prote nas de los pares Tambi n hemos probado los modelos no lineales tipo ANN con fines de comparaci n pero el modelo lineal da mejores resultados Hemos implementado este modelo en el servidor Web denominado TrypanoPPI a la disposici n del p blico de forma gratuita en http bio aims udc es TrypanoPPI php Figura 29 Este es el primer modelo que predice si los complejos prote na prote na en el proteoma de 7rypanosoma son nicos con respecto a otros par sitos y hu spedes abriendo nuevas oportunidades para el descubrimiento de dianas para f rmacos anti Tripanosoma Un exjemplo de resultado para los pares entre los listas de cadenas proteicas 1HOZA IK3TB y HOZB 1F2CA se presenta en la Figura 30 60 Ibero NBIC Network R MEDIR TIC m TrypanoPPI calc Bio AIMS Home Links About Modelling the reality Process ID 109305174346d783d5 PDB List 1 1HOZA 1K3TB PDB List 2 1HOZB 1F2CA please wait PDB Update Verification List 1 1HOZA 1K3TB PDB Update Verification List 2 1HOZB 1F2CA Processing PDB chain List 1 1HOZA 1K3TB Processing PDB chain List 2 1HOZB 1F2CA Res
223. nd Trypanosoma cruzi is responsible in South America for Chagas disease which can cause acute illness and death especially in young children More commonly patients develop a chronic form of the disease that affects most organs of the body often causing fatal damage to the heart and digestive tract Transmission occurs via bloodsucking triatomine bugs and congenitally from mother to the unborn child but can also occur through contaminated blood transfusions http www who int en Control of HAT relies primarily on chemotherapy Nevertheless there is a very limited arsenal of drugs but they generally have shortcomings such as high toxicity and emerging resistance The drugs currently available to treat HAT have been available for more than half a century Early stages of HAT are treated with 10 1021 pr900827b 2010 American Chemical Society Trypano PPI pentamidine and suramin Side effects for both drugs are significant and the failure rate is high especially for suramin Late stages of HAT can be treated with melarsoprol a melami nophenyl arsenical compound that is able to cross the blood brain barrier Drug induced side effects are severe and up to 596 of those patients treated die The only alternative to melarsoprol is eflornithine an analogue of ornithine that acts as an inhibitor of trypanosomal ornithine decarboxylase leading to a block in polyamine synthesis Side effects are significant but eflornithine is much less t
224. nd X Xiao PLoS One 2011 6 el8258 K C Chou Z C Wu and X Xiao Mol BioSyst 2012 8 629 641 X Xiao P Wang and K C Chou Mol BioSyst 2011 7 911 919 G J McLachlan K A Do and C Ambroise Analyzing Microarray Gene Expression Data Wiley Interscience Hoboken New Jersey 2004 R Kohavi A study of cross validation and bootstrap for accuracy estimation and model selection Montreal Quebec Canada 1995 R Picard and D Cook J Am Stat Assoc 1984 79 575 583 J A Hanley and B J McNeil Radiology 1982 143 29 36 K Linnet Clin Chem 1988 34 1379 1386 A S Glas J G Lijmer M H Prins G J Bonsel and P M Bossuyt J Clin Epidemiol 2003 56 1129 1135 SPSS SPSS Chicago 2009 Y Marrero Ponce H G Diaz V R Zaldivar F Torrens and E A Castro Bioorg Med Chem 2004 12 5331 5342 A H Morales M A Cabrera Perez and M P Gonzalez J Mol Model 2006 12 769 780 E Estrada and E Molina J Chem Inf Comput Sci 2001 41 791 797 J A Castillo Garit Y Marrero Ponce F Torrens R Garcia Domenech and V Romero Zaldivar J Comput Chem 2008 29 2500 2512 C R Munteanu and H Gonz les Di z S2SNet Sequence to Star Network Santiago de Compostela Spain 2008 K C Chou and H B Shen Nat Sci 2009 1 63 92 This journal is The Royal Society of Chemistry 2012 Journal of technical notes Droteome eresearch Trypano PP
225. nd human colon cancer HCC The general discriminant analysis method generated the best model with the training predicting set accuracies of 90 0 for the forward stepwise model type The model was based on 5 pure and mixed star graph TIs obtained with S2SNet software The other study using the same protein dataset is based on lattice graphs 69 proteins related to HCC and a control group of 200 proteins non related to HCC were represented through an HP Lattice type Network Starting from the generated graphs a set of descriptors of electrostatic potential This journal is The Royal Society of Chemistry 2012 type has been calculated The Linear Discriminant Analysis LDA helped to establish a QSAR model of relatively high percentage of good classification between 80 and 90 to differentiate between HCC and non HCC proteins Therefore the current study proposes an alternative model with better prediction capacity based on a different type of protein graph on Shannon entropy information of the graph and on a simple statistical method such as Naive Bayes This work can help in oncology proteomics or serve as model for other studies for proteins linked with different diseases In addition the new CULSPIN application is demon strating its capacity to transform simple protein sequences into TIs and to be the base of protein studies Since user friendly and publicly accessible web servers represent the future direction for developing p
226. nda a un caso o secuencia diferente Secuencias de letras Cha 01 Cha 02 Cha 03 DGG DDD GDG GGG DGD GDG GGG GGG GGG GGG GGG GGG 47 Secuencias num ricas Cha 01 Cha 02 Cha 03 7 86E 05 2 18E 07 9 60E 05 2 18E 07 9 60E 05 0 00036601 9 60E 05 0 00036601 0 0008102 0 00036601 0 0008102 0 00142856 0 0008102 0 00142856 0 00222112 0 00142856 0 00222112 0 00318787 0 00222112 0 00318787 0 00432881 0 00318787 0 00432881 0 00564393 0 00432881 0 00564393 0 00713324 0 00564393 0 00713324 0 00879674 0 00713324 0 00879674 0 01063443 0 00879674 0 01063443 0 01264631 0 01063443 0 01264631 0 01483238 0 01264631 0 01483238 0 01719263 0 01483238 0 01719263 0 01972708 c Text file in FASTA format gt gil221068402 ref ZP_03544507 1 enzyme Comamonas testosteroni KF 1 MSEPVNQWPQTLEERIDRLESLDAIRQLAGKYSLSLDMRDMDAHVNLFAPDIKVGKEKVGRAHFMAWQDS TLRDQFTGTSHHLGQHIIEFVDRDHATGVVYSKNEHECGAEWVIMQMLYWDDYERIDGQWYFRRRLPCYW YATDLNKPPIGDMKMRWPGREPYHGAFHELFPS WKEF WAQRPGKDQLPQVAAPAPLEQFLRTMRRGTPAP RMRVR gt gi 220713425 gb EED68793 1 enzyme Comamonas testosteroni KF 1 MSEPVNQWPQTLEERIDRLESLDAIRQLAGKYSLSLDMRDMDAHVNLFAPDIKVGKEKVGRAHFMAWQDS TLRDQFTGTSHHLGQHIIEFVDRDHATGVVYSKNEHECGAEWVIMQMLYWDDYERIDGQWYFRRRLPCYW YATDLNKPPIGDMKMRWPGREPYHGAFHELFPSWKEFWAQRPGKDQLPQVAAPAPLEQFLRTMRRGTPAP RMRVR 7gi 77360245 ref YP 339820 1 enzyme Pseudoalteromonas haloplanktis TAC125 MQYLVISDIYGKTPCLQQLAKHFNAENQIVDPYNGVHQALENEEEYYKLFIKHCGHDEYAAKLEEYFNKL SKPTICIAFSAGASAA W
227. ng QSAR can be found in literature Gonz lez D az et al 2006 2007a 2010 Prado Prado et al 2008 Riera Fern ndez et al 2012 regarding protein folding kinetics Chou 1990 enzyme catalyzed reactions Chou 1989 Chou and Forsen 1980 Chou and Liu 1981 Kuzmic et al 1992 inhibition kinetics of processive nucleic acid polymerases and nucleases Althaus et al 1993a 1993b 1994 1996 Chou et al 1994 DNA sequence analysis Qi et al 2007 anti sense strands base frequencies Chou et al 1996 analysis of codon usage Chou and Zhang 1992 Zhang and Chou 1994 Cancer predic tion Aguiar Pulido et al 2012 as well as complex network systems investigations Diao et al 2007 Gonzalez Diaz et al 2007b 2008 In this work the authors propose the first non antioxidant antioxidant protein classification model based on embedded non embedded Star Graph Tls including the trace of connectivity matrices Harary number Wiener index Gutman index Schultz index Moreau Broto indices Balaban distance connectivity index Kier Hall connectivity indices and Randi connectivity index This information is then used as input to several classification techni ques obtaining the best results when the Random Forest technique is used 2 Materials and methods The description of the methodology followed in this work is presented in Fig 1 The input data is represented by the amino acid sequences primary structure antioxidant and
228. ng los modelos QSAR que pueden establecer una relaci n cuantitativa entre la estructura qu mica de los f rmacos dianas moleculares y la actividad biol gica capacidad espec fica de interaccionar Una limitaci n de casi todos los modelos QSAR QSPR es que predicen la actividad biol gica de los medicamentos s lo para un sistema biol gico organismo diana etc La soluci n viene con el desarrollo de modelos multiples tareas QSAR QSPR mt QSAR mt QSPR para predecir la actividad de los f rmacos propiedades contra diferentes sistemas biol gicos Estos mt QSAR mt QSPRs ofrecen tambi n una buena oportunidad para la construcci n de redes complejas que se pueden utilizar para explorar grandes y complejas bases de datos de medicamentos sistemas biol gicos En esta secci n vamos a revisar algunos de los modelos mt QSAR QSPR propuestos en la literatura y las redes de deriva de estos estudios 1 2 1 Modelos de clasificaci n para compuestos anti virales Prado Prado et al 72 han utilizado la teor a de la cadena de Markov para calcular nuevos momentos espectrales para multiples dianas con el fin de ajustar un modelo mt QSAR para medicamentos activos contra 40 especies virales El modelo se basa en 500 medicamentos incluidos compuestos activos e inactivos analizados como agentes antivirales en la literatura reciente no todos los medicamentos fueron evaluados contra todos los virus s lo aquellos con valores experimentales La base de datos ta
229. nisms The proposed mechanism of action is corroborated by crystal structures of the enzyme with risedronate and zoledronate bonds showing how this enzyme s unique chain length determinant region enables it to accommo date larger substrates and products N BPs such as pamidro nate alendronate risedronate ibandronate and zoledronate seem to act as analogues of isoprenoid diphosphate lipids thereby inhibiting FPP synthase an enzyme in the mevalonate pathway Interestingly risedronate leads to an 88 9 inhibition of the rodent parasite Plasmodium berghei It may indicate that the prediction by LIBP Pred as a potential drug target with LIBP function is correct and may break new ground to search for similar proteins in other parasites This journal is The Royal Society of Chemistry 2012 However the protein is still reported as predicted with this putative enzyme action but function unknown In any case BLAST analysis also supports this idea by alignment finding high homology between this protein and similar proteins in other organisms see Fig 5 PDB mining of human proteome with LIBP Pred Considering that LIBPs FABPs are very important cancer biomarkers in humans we decided to carry out a prediction of S LIBP values for all human proteins with unknown function in PDB In Table 2 we summarized the most promis ing results found for human proteins see also full results in ESI 11 available online or upon author s request
230. nough P HCC T P shiJHCC Table 2 Diagnostic accuracy and predictive values of Naive Bayes for HCC Cut off 0 1940 0 5 AUC 0 91 0 86 0 96 0 91 0 86 0 96 TP 60 51 FP 35 10 TN 241 266 FN 9 18 Se 87 3 83 4 91 2 96 4 94 2 98 6 Sp 87 0 79 0 94 9 73 9 63 6 84 3 PPV 94 6 94 1 98 7 93 7 90 8 96 5 NPV 63 2 53 5 72 9 83 6 74 3 92 9 LR 0 1 0 3 LR 6 9 20 4 DOR 45 9 754 This output represents the probability of HCC while Sh c are the Shannon entropy topological indices of class c for the protein spiral graphs The model obtained a classification accuracy of 90 92 and it showed an AUROC of 0 91 Fig 4 for the test group This AUROC value demonstrates that the model has excellent classification potential by providing a very good prediction for HCC related proteins The above results are typically considered as excellent in the literature QSAR QSDR models 5 95 Diagnostic performance Table 2 shows diagnostic accuracy and predictive values of Naive Bayes for two different cut offs These results were obtained for the HCC test group Better values were obtained for a cut off of 0 5 Although the specificity is lower than the one obtained for a cut off of 0 1940 the sensitivity is higher In addition the NPV for a cut off of 0 5 is 83 6 compared to 63 2 for a cut off of 0 1940 i ROC cure 2 E 0 8 o c gt E 0 6 o 122 Er 9 04 2 E o 0 2 2 2 E 8 0 e
231. nput variables to two input neurons that perform a weighted sum and assign the result to one output neuron which gives the final result of classification of the case according to the threshold value that have been optimized In addition the model LNN 2 2 1 1 presented also higher levels of sensitivity 91 4 specificity 90 9 and accuracy 90 9 in the external test set see Table 1 We also validated the model by means of a ROC curve analysis see Figure 5 The values of the area under the ROC curve for this model are 0 95 and 0 96 very close to 1 the highest possible value and notably different from 0 5 the value typical of a random classifier The comparison of linear and nonlinear models is essential to test how directly our parameters are correlated to the biological property This first search points to a linear instead of nonlinear relationship between TPPI prediction and 4 s values giving additional proof of the validity of Journal of Proteome Research e Vol 9 No 2 2010 1187 technical notes Rodriguez Soca et al 1 1 T r r r T 1 0 0 9 0 8 0 7 0 6 0 5 0 4 1 Specificity 0 3 0 2 0 1 0 0 0 1 T i 0 2 0 2 0 1 00 01 02 03 0 4 Figure 5 ROC curve for the TPPI predictor with profile LNN 1 2 1 1 our methodology For instance in Table 1 we can see that more complicated models with very nonlinear profiles do not imp
232. nput neurons one per input variable Ny is the number of neurons in the first Hidden layer H1 Ng is the number of neurons in the second Hidden layer H1 Non is the number of output neurons and Nov is the number of output variables The automatically selection of variables features was activated for all models Interestingly three variables 4 s E s and Ex s out of more than 30 parameters calculated appear in many models and are chosen by an additional LDA variable selection These parameters have the general formula dE s 1 amp S proti amp S prot2l which are the absolute difference between the electrostatic potential values amp s for amino acids on the surface of the two proteins forming the PPI pairs This fact indicates that the difference between the surface electrostatic potential is very important not only for PPI interactions in general but also to discriminate unique complex present in Trypanosome TPPIs and not in other organisms In particular the model LNN 2 2 1 1 is the simplest model found with higher levels of sensitivity 88 2 specificity 89 7 and accuracy 89 5 in training set These values are excellent considering that this predictor uses only two molecular descriptors of the PPI pair The fitting of this large data set of 768 TPPIs and 5104 non TPPls is a very complex process from a chemical point of view The profile 2 2 1 1 indicates that this model assigns the values of only two i
233. ns has become a goal of major importance This work is dedicated to the amino terminal Cu Il and Ni Il binding ATCUN motifs that participate in the DNA cleavage and have antitumor activity We have calculated herein for the first time the 3D electrostatic spectral moments for 415 different proteins including 133 potential ATCUN antitumor proteins Using these parameters as input for Linear Discriminant Analysis we have found a model that discriminates between ATCUN DNA cleavage proteins and nonactive proteins with 91 32 Accuracy 379 out of 415 of proteins including both training and external validation series Finally the model has predicted for the first time the DNA cleavage function of proteins from the pathogen parasites We have predicted possible ATCUN like proteins with a probability higher than 99 in nine parasite families such as Trypanosoma Plasmodium Leishmania or Toxoplasma The distribution by biological function of the ATCUN proteins predicted has been the following oxidoreductases 70 596 signaling proteins 62 596 lyases 58 296 membrane proteins 45 596 ligases 44 496 hydrolases 41 396 transferases 39 296 cell adhesion proteins 34 596 metal binders 33 596 translation proteins 25 096 transporters 16 796 structural proteins 9 196 and isomerases 8 2 The model is implemented at http miaja tic udc es Bio AIMS ATCUNPred php Keywords Cu Ni cluster e ATCUN like motif e DNA cleavage e antitumor activity e Markov model
234. nterface of LIBP Pred tool freely available to the academic community at http zhang bioinformatics ku edu LOMETS After generating PDB files with LOMETS we can upload them to LIBP Pred This is the same strategy used to develop the mode 2 of the web server MIND BEST to predict drug target interactions between drugs and proteins with unknown 3D structure Anyhow we have to be aware that by using this input mode 2 we can predict S LIBP values using 3D structural models generated only by modelling Consequently predictions derived with input mode 2 have to be used with higher caution than predictions obtained with input mode 1 LIBP Pred mining of PDB The existence in PDB of 3000 proteins with unknown function and the interest in the discovery of new LIBPs or LBPs as drug targets in parasite infections or cancer biomarkers prompt us to carry out a data mining search of new LIBPs candidates in PDB For this study we have implemented the key function PDB mining in the new server LIBP Pred By clicking this key the server performs automatic search of all PDB files with unknown function at a reference date After that LIBP Pred extracts all C coordinates from these files and calculates the necessary z R values for all these proteins Last the server uses these values as inputs of the best model found and predicts the S LIBP values for all these proteins The proteins with highest scores may be selected as candidates for experimental
235. o chain gt LIBPpred used the entire protein all the chains Done Figura 35 Ejemplo de utilizacion del servidor LIBPpred 68 3 CONCLUSIONES Se exponen las conclusiones en concordancia con los objetivos trazados agrupadas seg n el tipo de estudios realizados u objetivo perseguido 1 desarrollo de programas 2 busqueda de modelos QSAR 3 implementaci n de servidores 4 publicaci n de resultados 1 Se desarrollaron tres nuevas herramientas inform ticas como programas de ordenador para el c lculo de ndices topol gicos de utilidad en el desarrollo de modelos QSAR a distintos niveles estructurales 2 Se encontraron nuevos modelos QSAR aplicables a la predicci n de la actividad biol gica de compuestos de inter s en Qu mica Farmac utica Microbiolog a y Parasitolog a usando los nuevos programas desarrollados 3 Se han implementado los nuevos modelos QSAR en cuatro herramientas inform ticas para usar en la red servidores Web para la predicci n online de la actividad biol gica de compuestos y sus correspondientes dianas moleculares Esto tiene un gran inter s sobre todo en Qu mica Farmac utica Microbiolog a y Parasitolog a 4 Se publicaron los resultados en art culos de revistas especializadas y en cap tulos de libro describiendo las aplicaciones de las herramientas desarrolladas 5 Se llev a cabo la protecci n de la propiedad intelectual mediante los correspondien
236. o inactivos contra 41 especies diferentes de virus analizadas El modelo ha clasificado correctamente 143 de los 169 compuestos antivirales activos especificidad 84 62 y 119 de los 139 compuestos inactivos sensibilidad 85 61 La precisi n en los datos de entrenamiento fue del 85 1 262 de los 308 casos Por otra parte la validaci n del modelo se ha llevado a cabo utilizando la serie de predicci n externa obteniendo una precisi n de validaci n cruzada de 90 7 466 de los 514 compuestos Para ilustrar el funcionamiento del modelo en la pr ctica se desarroll un screening virtual que reconoce como activos 102 de los 110 92 7 compuestos antivirales que no se utilizan en las series de entrenamiento o de predicci n La ecuaci n del modelo es la siguiente Actv 1 90 C C 21 64 C C 1 022 C Cine 1 102 C C 0 72 C X 4 1 08 C Het 1 07 C H Het 0 75 C H Het 0 08 4 047 Rc 0 726 p lt 0 001 Q uns donde A es la estad stica de Wilk Rc es la correlaci n can nica y p el nivel de error En la ecuaci n C es el indice molecular de una cierta especie despu s de k etapas Se ha calculado para el total T de los tomos en la mol cula o para asociaciones espec ficas de tomos presentadas en la ecuaci n anterior 13 1 2 2 Modelos de clasificaci n para compuestos anti bacterianos Prado Prado et al 74 han desarrollado un modelo de Markov para describir la actividad
237. odels that help to detect molecules with antioxidant properties would be very helpful On this basis the main objective of this paper will be to develop models that on one hand will reduce the number of molecules for tests in different trials and on the other hand to increase the success rates when molecules are tested looking for these properties In order to achieve this the authors have used Quantitative Structure Activity Relationships QSARs Devillers and Balaban 1999 QSARs are based on Graph Theory one of the most common techniques used in protein analysis Using this techni que macromolecular descriptors named topological indexes TIs are calculated for its later analysis This branch of math ematical chemistry has become an intense area of research generating new information regarding DNA proteins by repre senting them as graphs and obtaining the corresponding TIs in order to analyse the resulting complex networks Ag ero Chapin et al 2006 Bielinska Wa z et al 2007 Munteanu et al 2010 Randi and Balaban 2003 In order to perform these analyses the TIs are then processed by a classification technique such as Support Vector Machines SVMs Vapnik 1995 Artificial Neural Networks ANNs Rivero et al 2011 Random Space Classifiers Skurichina and Duin 2002 Linear Discriminant Analysis LDA etc abstracting general properties for future molecules that have not been already tested Many examples involvi
238. omising drug targets To the best of our knowledge there are no general models to predict Unique PPIs in Trypanosome TPPIs On the other hand the 3D structure of an increasing number of Trypanosome proteins is reported in databases In this regard the introduction of a new model to predict TPPIs from the 3D structure of proteins involved in PPI is very important For this purpose we introduced new protein protein complex invariants based on the Markov average electrostatic potential Ex R for amino acids located in different regions Ri of i th protein and placed at a distance k one from each other We calculated more than 30 different types of parameters for 7866 pairs of proteins 1023 TPPIs and 6823 non TPPls from more than 20 organisms including parasites and human or cattle hosts We found a very simple linear model that predicts above 90 of TPPIs and non TPPIs both in training and independent test subsets using only two parameters The parameters were s amp s amp s2 the absolute difference between the s values To whom correspondence should be addressed H Gonz lez D az Faculty of Pharmacy USC Spain Phone 34 981 563100 Fax 34 981 594912 E mail humberto gonzalez usc es or gonzalezdiazh yahoo es University of Santiago de Compostela University of A Coru a 1182 Journal of Proteome Research 2070 9 1182 1190 Published on Web 11 30 2009 on the surface of the two proteins of
239. on sus tendencias 20 destructivas y sus consecuencias Frente a la avalancha de nuevas secuencias de la prote na descubierta en la era post gen mica nos enfrentamos al reto de desarrollar m todos automatizados de forma r pida y precisa para la predicci n de los sitios de ubicaci n de las prote nas virales en una c lula hu sped la informaci n adquirida es particularmente importante para la ciencia m dica y el dise o de f rmacos antivirales Shen et al desarrollaron un clasificador de fusi n llamado Virus mPLoc establecido por la hibridaci n de la informaci n gen tica de Ontolog a la informaci n del dominio funcional y la informaci n de la evoluci n secuencial La nueva herramienta no s lo puede predecir con mayor exactitud los sitios de ubicaci n de las prote nas virales en una c lula hu sped sino que tambi n tiene la capacidad de identificar la ubicaci n de varias prote nas del virus que est m s all del alcance de cualquier predicci n existente especializada en prote nas virales El servidor esta implementado en http www csbio sjtu edu cn bioinf virus multi El segundo servidor Gneg mPLoc 83 predice la ubicaci n de las prote nas en bacterias gram negativas incorporando la informaci n de ontolog a de los genes el dominio funcional y la evoluci n secuencial Figura 10 Se puede utilizar para identificar prote nas en bacterias Gram negativas en ochos ubicaciones 1 citoplasma 2 extracelular 3
240. ork 3 4 PlasmodPPI a server for PPC plasmodium targets Last we have to consider that with the advent of Internet it is important not only to develop new predictive models for proteome research but also to carry out the implementation of these models in public web servers available to other research groups 36 39 110 113 In this regard we implemented this predictor into a web server freely available to public at http miaja tic udc es Bio AIMS PlasmodPPI php This is the first model and web server that EA Home Theory About Plasmod PPI Plasmodium Protein Protein Interactions PPPI PDB chain lists Please paste the names of the PDB chains as two lists max 50 Notes There is no space between the PDB name and the chain label no emptry new line the results v ll print the pairs between the chain from list 1 with the chain from list 2 not the combination of the list items Tool MARCH INSIDE Python version Data RCSB PDB PlasmodPPI calc O Bio AIMS Process ID 137494af1487ed0412 PDB List 1 3CSIA 2F6IE 1SYRC PDB List 2 3CSIE 2GHUA 1SYRF please wait PDB Update Verification List 1 3CSIA 2F6IE 1SYRC PDB Update Verification List 2 3CSIE 2GHUA 1SYRF Processing PDB chain List 1 3CSIA 2F6IE 1SYRC Processing PDB chain List 2 3CSIE 2GHUA 1SYRF 3CSIA ZF6IE 13YRC 3CSIE 2GHUA 1SYRF p Home Theory About Result file Results 137494a3f1487ed04
241. os contra diferentes especies de par sitos analizadas El modelo clasific correctamente 311 de los 358 compuestos activos 86 9 y 2328 de los 2577 compuestos inactivos 90 3 en las series de entrenamiento El rendimiento total de entrenamiento fue del 89 995 La validaci n del modelo se llev a cabo mediante series de predicci n externa En estas series el modelo clasific correctamente 157 de los 190 82 6 compuestos antiparasitarios y 1151 de los 1277 compuestos inactivos 90 196 El rendimiento total de predictibilidad fue del 89 2 Adem s cuatro tipos de Redes Neuronales Artificiales ANNs no lineales fueron desarrolladas y comparadas con el modelo mt QSAR El modelo mejorado de ANN tuvo un rendimiento total de entrenamiento del 87 La ecuaci n del modelo es la siguiente 16 Actv 1 49 p C 1 122 4 Cuns 1 927 1 Coa 40 53 u X 1 71 u H Het 20 972 wz H Het 5 21 uns Det A A 0 52 1904 6 p lt 0 001 7 El coeficiente es la estad stica de Wilk estad stica de la discriminaci n total y2 es el de chi cuadrado y p es el nivel de error En esta ecuaci n us se ha calculado para el total T de tomos en la mol cula o para asociaciones espec ficas de tomos Estas asociaciones son tomos con una caracter stica com n H Het hidr geno unido a hetero tomos Cuns tomos de carbono insaturados Csat tomos de carbono saturados X tomos de hal geno 1 2 4 Modelo
242. os servidores Web Yamilet Rodriguez Soca Cristian R Munteanu Juli n Dorado Alejandro Pazos Francisco J Prado Prado and Humberto Gonz lez D az Trypano PPI A Web Server for Prediction of Unique Targets in Trypanosome Proteome by using Electrostatic Parameters of Protein Protein Interactions Journal of Proteome Research 9 2 1182 1190 2010 http goo gl nCgR9 Herramienta http bio aims udc es TrypanoPPI php Yamilet Rodriguez Soca Cristian R Munteanu Julian Dorado Juan Rabu al Alejandro Pazos and Humberto Gonz lez D az Plasmod PPI a web server predicting complex biopolymer targets in Plasmodium with entropy measures of protein protein interactions Polymer 51 1 264 273 2010 http goo gl hRhm9 Herramienta http bio aims udc es PlasmodPPI php Cristian R Munteanu Jos M V zquez Juli n Dorado Alejandro Pazos Sierra ngeles S nchez Gonz lez Francisco J Prado Prado and Humberto Gonz lez D az Complex Network Spectral Moments for ATCUN Motif DNA Cleavage First Predictive Study on Proteins of Human Pathogen Parasites Journal of Proteome Research 8 11 5219 5228 2009 http goo gl u7 Thg Herramienta http bio aims udc es ATCUNPred php Humberto Gonz lez D az Cristian R Munteanu Lucian Postelnicu Francisco Prado Prado Marcos Gestal and Alejandro Pazos LIBP Pred web server for lipid binding proteins using structural network parameters PDB mining of human cancer biomarkers and drug targ
243. ote nas que interacciona con los l pidos en Shigella flexneri Plasmodium berghei y Cryptosporidium parvum gt las herramientas se pueden utilizar para el descubrimiento de nuevos f rmacos y sus dianas proteicas interacciones prote nas prote nas o nuevas prote nas con una actividad espec fica Red Grafo Matrices del grafo real abstracto Conectividades distancias entre nodos grado de enlace de los o Y Y Prote na ul nodos probabilidades de transici n MInD Prot a yh S2SNet HN N A f CULSPIN F rmaco Y S Descriptores moleculares o ndices topol gicos Tis Modelos de clasificaci n QSAR de prote nas y o f rmacos mediante an lisis discriminante general redes neuronales artificiales aprendizaje autom tico computaci n evolutiva etc Implementaci n herramientas online en el servidor Bio AIMS Encontrar nuevos F RMACOS y DIANAS proteicas contra los microbios y par sitos Figura 1 La esquema general del trabajo con t cnicas QSAR y la teor a de las redes complejas el descubrimiento de f rmacos y dianas moleculares La secci n INTRODUCCI N comienza describiendo los programas inform ticos existentes para el c lculo de los descriptores moleculares ndices topol gicos tales como DRAGON MoDesLab TOMO COMD MARCH INSIDE E Cale y CODESSA PRO La misma secci n contin a con la presentaci n de los modelos existentes de tipo QSAR QSPR para compuestos
244. ounds Eur J Med Chem 42 580 585 Gonzalez Diaz H Gonzalez Diaz Y Santana L Ubeira F M Uriarte E 2008 Proteomics networks and connectivity indices Proteomics 8 750 778 Gonz lez D az H Sanchez Gonzalez A Gonzalez Diaz Y 2006 3D QSAR study for DNA cleavage proteins with a potential anti tumor ATCUN like motif J Inorg Biochem 100 1290 1297 Gonz lez D az H Vilar S Rivero D Fern ndez Blanco E Porto A Munteanu C R 2010 QSPR Models for Cerebral Cortex Co Activation Networks Topolo gical Indices for Medicinal Chemistry Biology Parasitology and Social Networks Research Signpost Gonz lez D az H Vilar S Santana L Uriarte E 2007b Medicinal chemistry and bioinformatics current trends in drugs discovery with networks topological indices Curr Top Med Chem 7 1025 1039 Hall M Frank E Holmes G Pfahringer B Reutemann P Witten LA 2009a The WEKA data mining software an update SIGKDD Explor 11 Hall M Frank E Holmes G Pfahringer B Reutemann P Witten I H 2009b The WEKA data mining software an update SIGKDD Explor 11 Harary F 1969 Graph Theory Reading MA Harman D 1981 The aging process Proc Natl Acad Sci U S A 78 7124 7128 Hayflick L 2000 The future of ageing Nature 408 267 269 Koutsofios E North S C 1993 Drawing Graphs with Dot AT amp T Bell Laboratories Murray Hill NJ USA Kuzmic P Ng
245. oxic than melarsoprol However eflornithine is not effective against the form of the disease caused by T brucei rhodesiense in East Africa In this context a research aimed at the identification and validation of novel drug targets is a major goal for the scientific com munity Recently many researchers have spent important efforts on the experimental and or theoretical studies of protein protein interactions PPIs in pathogen Trypanosoma species In addition the knowledge about the biology of these parasites according to the investigation of PPIs may guide researchers on the search of new drug targets for HAT or Chagas disease For instance Choe and Moyersoen et al carried out the analysis of the sequence motifs responsible for the interactions of peroxins 14 and 5 which are involved in glycosome biogenesis in Trypanosoma brucei Glycosome biogenesis in trypanoso matids occurs via a process that is homologous to peroxisome biogenesis in other eukaryotes Glycosomal matrix proteins are synthesized in the cytosol and imported post translationally The import process involves a series of PPIs starting from recognition of glycosomal matrix proteins by a receptor in the cytosol Most proteins to be imported contain so called PTS 1 or PTS 2 targeting sequences recognized by the receptor proteins PEX5 and PEX7 respectively These authors measured the strength of the interactions between Trypanosoma brucei PEX14 and PEX5 by a fluorescence
246. probabilities p with which the amino acids interact with the other amino acids that are located at a distance i 1 2 3 k O represents the 3D orbits regions of the protein structure where the interacting amino acids are located By expanding this equation we can obtain for k 0 the initial unperturbed spectral moments xp for k 1 the short range 7 for k 2 the middle range 72 and for k 3 the long range spectral moments 73 respectively The notation of the type i 3 4 5 refers to the expansion of the descriptors in a series of k indices that encode structural features in the vicinity of the aa and is principally used for chain like data structures such as sequences This enumeration in the present work refers to sterically close neighbors placed at 1 2 8 or k times the 3D cutoff distance The expansion of eq 1 is illustrated for the tripeptide Ala Val Trp AVW 2 03 65 71 in the following equations 100 m TACY Tjo 1 0 23 2a 00 1 Pu Dis 0 T TiC n Pai Po Pos Pu Pas Te Pss 0 us Pas 2b Pu p 0 Pu Pr 0 m Tr CIDA Ti Da Da Da i Da Da Das 0 Pao Pss 0 Ps Pss Pia EN Poo pas 2c Pu Pia 0 Pa Po d m Tri CIT Tr Pai Pa n Ue Es 0 Bas Pe 0 Psp pas Tu Pii 0 Pa Da Da d Pr Pos 62 2d 0 Pao Pss To carry out the calculations referred to in eq 1 and detailed in eqs 2a 2b 2c and 2d the elements pj of II were calculated as
247. procesados por el LDA clasificando los medicamentos como activos o inactivos contra diferentes especies de par sitos analizadas El modelo clasific correctamente 212 de los 244 87 096 casos de la serie de entrenamiento y 207 de los 243 compuestos 85 4 de la serie de validaci n externa Con el fin de ilustrar el funcionamiento de las QSAR para la selecci n de medicamentos activos se llev a cabo un screening virtual adicional de compuestos antiparasitarios que no se utilizaron en las series de entrenamiento o predicci n El modelo reconoci 97 de 114 85 1 de ellos La ecuaci n del modelo es la siguiente Actv 4 15x10 C T 8 9x10 C C 1 5x10 C C 4 7x10 7 C Cun 2x107 C Het 7 9x10 C H Het 0 72 Re 0 75 220 4334 F 51 44 p 0 001 5 donde Rc es el coeficiente de correlaci n can nica es la estad stica de Wilk F es la relaci n de Fisher y p el nivel de error En esta ecuaci n C es el ndice molecular de una cierta especie despu s de k etapas Se ha calculado para el total T de tomos en la mol cula o para asociaciones espec ficas de tomos Estas asociaciones son tomos con una caracter stica com n Het hetero tomos H Het hidr geno unido a hetero tomos Cuns tomos de carbono insaturados Csat tomos de carbono saturados 15 Prado Prado et al 77 desarrollaron un modelo mt QSAR para m s de 700 medicamentos analizados en la literatura contra diferentes par
248. r issue W105 110 McDermott J Guerquin M Frazier Z Chang AN Samudrala R Nucleic Acids Res 2005 33 Web Server issue W324 5 Vedadi M Lew J Artz J Amani M Zhao Y Dong A et al Mol Biochem Parasitol 2007 151 1 100 10 Hogg T Nagarajan K Herzberg S Chen L Shen X Jiang H et al J Biol Chem 2006 281 35 25425 37 Banerjee AK Arora N Murty US J Vector Borne Dis 2009 46 3 171 83 Journal of proteome eresearch research articles Complex Network Spectral Moments for ATCUN Motif DNA Cleavage First Predictive Study on Proteins of Human Pathogen Parasites Cristian R Munteanu Jos M Vazquez Juli n Dorado Alejandro Pazos Sierra ngeles S nchez Gonz lez Francisco J Prado Prado and Humberto Gonz lez D az Department of Information and Communication Technologies Computer Science Faculty University of A Coru a Campus de Elvi a s n 15071 A Coru a Spain Department of Inorganic Chemistry Faculty of Pharmacy University of Santiago de Compostela Praza Seminario de Estudos Galegos s n Campus sur 15782 Santiago de Compostela Spain and Department of Microbiology amp Parasitology Faculty of Pharmacy University of Santiago de Compostela Praza Seminario de Estudos Galegos s n Campus sur 15782 Santiago de Compostela Spain Received June 25 2009 The development of methods that can predict the metal mediated biological activity based only on the 3D structure of metal unbound protei
249. r Modeling Protein Conformational Stability Gene V Protein Mutants Proteins 2007 67 834 852 Fern ndez M Caballero F Fern ndez L Abreu J I Acosta G Classification of conformational stability of protein mutants from 3D pseudo folding graph representation of protein sequences using support vector machines Proteins 2008 70 1 167 175 Zbilut J P Giuliani A Colosimo A Mitchell J C Colafrance schi M Marwan N Webber C L Jr Uversky V N Charge and hydrophobicity patterning along the sequence predicts the folding mechanism and aggregation of proteins a computational approach J Proteome Res 2004 3 6 1243 53 Krishnan A Giuliani A Zbilut J P Tomita M Network scaling invariants help to elucidate basic topological principles of proteins J Proteome Res 2007 6 10 3924 34 Krishnan A Zbilut J P Tomita M Giuliani A Proteins as networks usefulness of graph theory in protein science Curr Protein Pept Sci 2008 9 1 28 38 Giuliani A Benigni R Zbilut J P Webber C L Jr Sirabella P Colosimo A Nonlinear signal analysis methods in the elucida tion of protein sequence structure relationships Chem Rev 2002 102 5 1471 92 Marrero Ponce Y Medina Marrero R Castillo Garit J A Romero Zaldivar V Torrens F Castro E A Protein linear indices of the macromolecular pseudograph alpha carbon atom adjacency ma trix in bioinformatics P
250. r property by using our best model Materials and Methods Markov Model The information about the molecular struc ture of the proteins was codified by using the MM method with the II matrix the short term electrostatic interaction matrix II was constructed as a squared matrix n x n where n is the number of amino acids aa in the protein 5 9 6 We considered the hypothetical situation in which every j aa has an electrostatic potential y at an arbitrary initial time 1 All 5220 Journal of Proteome Research e Vol 8 No 11 2009 Munteanu et al the aa can interact with electrostatic energy 1E with every other aa in the protein To simplify the evaluation a truncation function a was applied in such a way that a short term electrostatic interaction takes place in a first approxima tion only between neighboring aa a 1 Otherwise the electrostatic interaction is banished aj 0 Thus the electrostatic interactions propagate indirectly between those aa within the protein backbone the long range interactions being possible not forbidden and estimated indirectly using the natural powers of II The spectral moments m of II encode information about protein spatial electrostatic indirect interactions between any aa and other aa one located at a distance k within the 3D protein backbone 97 aKO Y py THCID a ije R Equation 1 shows that the present electrostatic spectral moments zr depend on
251. ractically more useful models simulated methods or predictors we shall make efforts in our future work to provide a web server for the method presented in this paper Acknowledgements Cristian R Munteanu and Gonz lez D az H acknowledge the funding support for a research position by the Isidro Parga Pondal program from Xunta de Galicia and the European Social Fund ESF The work of Vanessa Aguiar Pulido is supported by the Plan I2C program from Xunta de Galicia and by the ESF This work is supported by the following projects RD07 0067 0005 funded by the Carlos III Health and 10SIN105004PR funded by Economy and Industry Department of Xunta de Galicia References 1 A Jemal R Siegel E Ward Y Hao J Xu T Murray and M J Thun Ca Cancer J Clin 2008 58 71 96 2 B Boursi and N Arber Ca Cancer J Clin 2007 13 2274 2282 3 C Schafmayer S Buch J H Egberts A Franke M Brosch A El Sharawy M Conring M Koschnick S Schwiedernoch A Katalinic B Kremer U R Folsch M Krawczak F Fandrich S Schreiber J Tepel and J Hampe Int J Cancer 2007 121 555 558 4 A N Freedman M L Slattery R Ballard Barbash G Willis B J Cann D Pee M H Gail and R M Pfeiffer J Clin Oncol 2009 27 686 693 5 G Ferino H Gonzalez Diaz G Delogu G Podda and E Uriarte Biochem Biophys Res Commun 2008 372 320 325 A Tropsha Mol Inf 2010 29 476 488 K Roy and I Mitra
252. rales y entrop as Shannon s lo para las prote nas Propiedad promedio para los f rmacos y las prote nas La aplicaci n Figure 12 puede calcular los ndices promedios para las redes complejas de las mol culas de prote nas y f rmacos mediante el uso de las clases de entrada para los medicamentos y prote nas o de la informaci n de los PDBs s lo para las prote nas Adem s MInD Prot puede generar los ndices mezclados de pares de prote nas pares de f rmacos o pares prote na f rmaco Si es necesario la herramienta puede generar al azar pares negativos para los pares de prote na y prote na f rmaco Se puede obtener informaci n adicional como son las cabeceras headings de los PDBs para las prote nas Estos n meros que caracterizan a cada prote na f rmaco o a un par prote na f rmaco se utilizan para la construcci n de modelos de clasificaci n tipo QSAR QPDR 26 ac Le en GS SN El a pp Els Ejs Ej Sw El ES Wiprotl Mint Markov Indices for Drugs and Proteins Figura 12 Interfaz programa MInD Prot Como utilizar el MInD Prot En la ventana principal se pueden elegir los par metros de c lculo y los par metros de entrada salida La interfaz principal del usuario se divide en las siguientes partes Prote nas PROTEINS gt Archivos de par metros o Archivo de entrada con la lista de cadenas de prote nas o el nombre de las prote nas de la base de datos PDB Databank http www p
253. rming the screening of large databases In addition with the intro duction of Internet the development of new predictive methods has become the first step in the application of computational techniques to proteome research Nowadays it is not enough to develop a fast and accurate predictive model we should also implement it into public servers preferably of free access for the use of the scientific community The server packages developed by Chou and Shen to predict the function of proteins from structural parameters or explore protein structures are good examples in this regard These may be used by proteome research scientists by interacting with user friendly inter faces It means that the user does not need to be an expert on the theoretical details behind this kind of model including the vast literature published by Chou et al on the develop ment of models with pseudo amino acid composition parameters or the use of machine learning classification techniques and other algorithms In any case to the best of our knowledge in the literature there is no theoretical method to predict unique TPPIs in Trypanosome proteome that are not present in humans or other organisms based on the 3D structure of the two proteins involved in the interaction Separately Gonz lez D az et al introduced the method called Markovian chemicals in silico design MARCH INSIDE 1 0 for the computational design of small sized drugs The approac
254. ro u obtenidas mediante alguna codificaci n o transformaci n de las explicadas anteriormente Una vez que las secuencias de letras son mostradas en esta caja de lista aparece una invitaci n a seleccionar las secuencias o casos a los que se les desea construir su Grafo U iDGGGDGDDGGGDGGGDGDGGDGDDDDGGGGGDGGDDGGGGGGGGGGGGGGGGKKKKK A A AKK 19 t Cha 07 GDDGGGGGGGGGGDGGGDGDDGGGDGGGDGDGGDGDDDDGGGGGDGGDDGGGGGGGGGGGGGGGGKKKKKAAAKK ha 08 GDDGGDGGGGGGGDGGGDGDDGGGDGGGDGDGGDGDDDDGGGGGDGGDDGGGGGGGGGGGGGGGGKKKKKAAAKK Figura 21 La selecci n de los casos para el CULSPIN Se puede seleccionar un bloque continuo de secuencias o casos manteniendo presionada la tecla May sculas al seleccionar el primero y el ltimo caso que conforman el bloque seleccionar casos alternos regularmente o no presionando la tecla Ctrl mientras se seleccionan los casos deseados o seleccionar todos los casos marcando la opci n Select All Despu s de construir los Grafos U de las secuencias seleccionadas la caja de lista mostrar s lo los casos con los que se trabaj En este momento se invita entonces a seleccionar los casos a los que se les desea calcular los T Is o seleccionar un nico caso para ver su grafo en una ventana 50 independiente El resto de las secuencias no estudiadas se pueden recuperar sin necesidad de leer nuevamente el fichero de entrada mediante la opci n Reload sequences presente en el men File En tal caso se comienza d
255. roposes the development of new tools for drug discovery and their molecular targets using software engineering and artificial intelligence techniques Consequently structural information was encoded in molecules topological indices of molecular graphs with the help of new specific computer programs These indices are used to seek the classification models that can predict the biological activity of new molecules or the interaction of molecules drugs proteins The best models were implemented as Web tools with free access to the scientific world All results were published in international journals with JCR impact factor Resumo A busca de novos f rmacos e os seus obxectivos moleculares tefien moito interese na industria farmac utica con implicaci ns na practica cl nica contra enfermidades complexas nomeadamente contra os microbios e parasitos Como a procura experimental da acci n biol xica de todas as drogas posibles e as s as dianas moi custoso e implica moito tempo xorde a necesidade de empregar m todos te ricos para prever os mellores candidatos A tese aqu proposta fai nfase no desenvolvemento de novas ferramentas para o descubrimento de drogas e dianas moleculares utilizando t cnicas de enxefier a inform tica e intelixencia artificial En consecuencia a informaci n estrutural das mol culas foi codificada en indices topol xicas de grafos moleculares coa axuda de novos programas inform ticos espec ficos implemen
256. roteins in nature a dataset containing 1999 proteins of which 324 are antioxidant proteins was created Using this data as input Star Graph Topological Indices were calculated with the S2SNet tool These indices were then used as input to several classification techniques Among the techniques utilised the Random Forest has shown the best performance achieving a score of 94 correctly classified instances Although the target class antioxidant proteins represents a tiny subset inside the dataset the proposed model is able to achieve a percentage of 81 8 correctly classified instances for this class with a precision of 81 3 2012 Elsevier Ltd All rights reserved 1 Introduction optimum health conditions In order to achieve this objective finding some mechanism that delays aging Cevenini et al 2010 Life expectancy is increasing every year especially in devel oped societies Nowadays in these countries it is not strange to find some people that are near one hundred years when 20 years ago this was quite rare For example in Spain life expectancy at birth has increased from 73 years in 1975 to more than 81 in 2011 OECD 2011 In this context it is obvious that people may want to spend the biggest part of their life in Corresponding author at University of A Coru a ICT Dept Facultad de Inform tica Campus de Elvi a s n 15071 A Coru a Spain Tel 34 981 167 000 fax 34 981 167 160 E mail addresses e
257. roteome Research e Vol 9 No 2 2010 Rodriguez Soca et al Em Y no 4 Y no 2 Tg jHleR jo j leR CY 4 It is remarkable that the average general potentials amp depend on the absolute probabilities p j with which the amino acids interact with other amino acids and their k order The potential amp R depends also on the initial unperturbed potential of the amino acid gj qj djo with dj equal to the distance from the carbon C of the amino acid to the center of the protein x y Z 0 0 0 In the equations presented above the p j values are calculated with the vector of absolute initial prob abilities 7 and the matrix M based on the Chapman Kolgomorov equations In particular the evaluation of such expansions for k 0 gives the initial average unperturbed electrostatic potential y for k 1 the short range potential amp for k 2 the middle range potential and for k 3 the long range one This expansion is illustrated for the tripeptide Ala Val Trp AVW 100 Pa E pol PV Mllo 1 ollo 0 0 1 lew 5 PA Pa py V gy PW Pw amp IPA PUV PMI Pa Pw Pw v 0 y Puy Pw p A 94 p V ey PW pw 6 Pay uy O Pii Pay 0 E p OM PMI Pa Pw Pw 0 Pwr Pw isa Bay 0 Pva Pw Pw Pv Pw 0 Pw Pw 7 Pia Pay 0 amp p p V PMI Pa Pw Pww j 0 Pwy wie ai Pay O0 Pa Pav O9
258. rove the linear model and sometimes give even worse results All the models are using as input only the three variables 4 s 4 s and amp 3 s selected before with an LDA variable selection model The big number of hidden neuron in the PNNs is automatically generated by the default algorithm from STATISTICA Last we should consider that with the advent of the Internet it is important not only to develop new predictive models for proteome research but also to carry out the implementation of these models in public web servers available to other research groups 45598578 Tn this regard we have implemented this predictor at a web server freely available to public at http miaja tic udc es Bio AIMS TrypanoPPI php This is the first modeland webserver that predicts how unique a protein protein complex in Trypanosome proteome is with respect to other parasites and host breaking new ground for antitrypanosome drug target discovery Conclusions In this paper we introduce a new type of parameters to numerically characterize protein structure in PPI studies We also demonstrate that it is possible to distinguish between protein protein complexes unique in Trypanosome species TPPIs cases and not present in other organisms with a linear classifier based on the absolute difference between 3D protein surface electrostatic potentials of the pair pro teins The model was implemented in a public web server available to the scientific co
259. rrollado por E Estrada y Guti rrez Y y fue lanzado por primera vez en 2002 Figura 4 Actualmente podemos encontrar la versi n 1 5 lanzada en 2004 Proporciona todas las herramientas necesarias para llevar a cabo estudios QSAR a partir de la entrada de un gran n mero de mol culas para el c lculo de descriptores moleculares por ejemplo Kier y Hall ndices Kappa los ndices de Balaban los descriptores de Abraham y descriptores sub estructurales propios del TOPS MODE Tambi n proporciona una manera muy til para definir las propiedades de los tomos enlaces y fragmentos as como permite introducir las estructuras moleculares en el lenguaje SMILES para el uso de estas propiedades en el c lculo de los descriptores moleculares 62 64 MOOUSLAB Js SO PF G e Figura 4 Interfaz del MoDesLab 1 1 3 TOMO COMD En 2002 Y Marrero Ponce y Romero V han lanzado la version 1 0 de TOMOCOMD Figura 5 Se compone de cuatro subprogramas y cada uno de ellos permite tanto la edici n de las estructuras modo de dibujo como el c lculo de descriptores moleculares 2D 3D modo de c lculo El software calcula distintos tipos de TIs a partir de formas alg bricas tales como la cuadratica qk w la lineal fk w y la bi lineal bk w v 65 En un trabajo reciente de revisi n se han discutido muchas aplicaciones de TOMOCOMD en estudios QSPR QSAR de f rmacos anti parasitarios 35
260. ry Wang G Dunbrack Jr R L 2003 PISCES a protein sequence culling server Bioinformatics 19 1589 1591 Zhang C T Chou K C 1994 Analysis of codon usage in 1562 E coli protein coding sequences J Mol Biol 238 1 8 Molecular BioSystems Cite this Mol BioSyst 2012 8 1716 1722 www rsc org molecularbiosystems Dynamic Article Links Y PAPER Naive Bayes QSDR classification based on spiral graph Shannon entropies for protein biomarkers in human colon cancer Vanessa Aguiar Pulido Cristian R Munteanu Jos A Seoane Enrique Fern ndez Blanco L zaro G P rez Montoto Humberto Gonz lez D az and Juli n Dorado Received 2nd February 2012 Accepted 9th March 2012 DOLI 10 1039 c2mb25039j Fast cancer diagnosis represents a real necessity in applied medicine due to the importance of this disease Thus theoretical models can help as prediction tools Graph theory representation is one option because it permits us to numerically describe any real system such as the protein macromolecules by transforming real properties into molecular graph topological indices This study proposes a new classification model for proteins linked with human colon cancer by using spiral graph topological indices of protein amino acid sequences The best quantitative structure disease relationship model is based on eleven Shannon entropy indices It was obtained with the Naive Bayes method and shows excellent
261. s e Get n random samples from the original dataset to use them as tree seeds e For each seed grow a non pruned tree and for each node randomly choose m predictors and the best split among those e Execute the different prediction trees and select as prediction the most voted one It may be highlighted that this technique is quite efficient because when constructing the trees the pruning phase has been deleted and the search is performed over a small set This simplification can give the idea that a single tree may have better performance but it was empirically proved that Random Forest overcomes the performance of CART single tree predictors Chipman et al 1998 3 Results The dataset used in this paper is composed of 1999 protein sequences from which 324 have proved to have antioxidant activity positive group The remaining 1675 proteins negative group are sequences from the CulledPDB server with identity less than 20 without antioxidant biological activity These protein sequences have been processed with the S2SNet application Munteanu et al 2009 in order to obtain the different topological indexes used in this study Specifically from each sequence 42 attributes are extracted from the embedded non embedded Star Graph The series of topological indices for each protein have been used to find the best antioxidant classification model with Machine Learning methods included in Weka Hall et al 2009b In order to
262. s de PDB como pares positivos y generando casos negativos hasta X veces los pares positivos o Como alternativa se puede utilizar un archivo con actividades activity file predefinido ProtPairActivity txt con PDB1 tab PDB2 tab Class F rmacos DRUGS Par metros archivos Files o Archivo con los c digos SMILES como Drug Name tab SMILE formula o Archivo para la salida simple con los c lculos de los ndices topol gicos para los f rmacos gt Resultados promedios utilizando las clases de los archivos de entrada Averaged results by input classes DRUG ClassAvg txt con Drug Name tab SMILE formula tab Class Pares de f rmacos Drug PAIRS siempre se hacen utilizando un archivo con la actividad biol gica de los f rmacos como DrugNamel tab DrugName2 tab Class ndices tipo Markov Markov Indices Existen tres tipos de indices momentos espectrales Spectral Moments entrop as tipo Shannon Shannon Entropies y propiedades promedias Mean Properties Se puede calcular separado las prote nas y los f rmacos si se calculan los dos se utiliza autom ticamente s lo Mean Properties basados en la electronegatividad tipo Amber de los tomos AmberCh Para el c lculo de las prote nas se pueden utilizar otras propiedades de los tomos amino cidos tales como Polar KJ AtContrib2P AtRefr vdW Area hardness I A Electrophilicity ElectroMulliken y los otros tipos de ndices Spectral Moments and Shannon
263. s de clasificaci n para compuestos anti f ngicos Gonz lez D az et al 79 desarrollaron un modelo unificado de Markov para describir con una sola ecuaci n lineal la actividad biol gica de 74 medicamentos analizados en la literatura contra algunas de las especies de hongos seleccionadas de una lista de 87 especies 491 casos en total Los datos fueron procesados por el LDA clasificando los medicamentos como activos o inactivos contra diferentes especies de hongos analizadas El modelo clasific correctamente 338 de los 368 compuestos activos 91 85 y 89 de los 123 compuestos inactivos 72 36 La predictibilidad total para el entrenamiento fue del 86 97 427 de los 491 compuestos La validaci n del modelo se llev a cabo mediante el m todo leave species out LSO Despu s de eliminar paso a paso todos los medicamentos analizados contra una especie los autores registraron un porcentaje de buena clasificaci n de los compuestos leave species out previsibilidad LSO Adem s se tom en consideraci n la solidez del modelo para la eliminaci n de los compuestos robustez LSO Este aspecto fue considerado como la variaci n del porcentaje de buena clasificaci n del modelo modificado A con el LSO con respecto al original El promedio de previsibilidad LSO fue del 86 41 0 95 promedio SD y A 0 5596 siendo 6 el n mero promedio de medicamentos analizados contra cada especie de hongos Los resultados de algunas de las 87 espec
264. s de tipo num rico En ella se ofrecen dos heur sticas diferentes para transformar una secuencia o serie num rica en una secuencia de letras n Regular Interval Classes en esta opci n los datos num ricos tomados del fichero de entrada se dividen en n intervalos o clases 2 lt n lt 10 y se les asigna una letra diferente Entonces cada elemento de la secuencia o serie num rica se codifica con la letra de la clase a la que pertenece 49 gt n o Interval Classes en esta opci n los datos num ricos tomados del fichero de entrada se dividen en 2n 2 intervalos 2 lt n lt 4 cuyas dimensiones dependen de la desviaci n est ndar de los datos A cada intervalo o clase se le asigna una letra y se codifica cada elemento de la secuencia o serie num rica con la letra de la clase a la que pertenezca Note En el caso de los datos de MS la presente versi n de CULSPIN los transforma previamente en una serie num rica en la que cada elemento es el producto de la m z por la intensidad de cada senal del espectro Luego esta serie num rica es transformada en una secuencia de letras utilizando la heur stica seleccionada por el usuario III A list box for view select sequences esta caja de lista tiene la funci n de mostrar y permitir la selecci n de secuencias o casos Figura 21 En un inicio la lista est vac a y despu s de leer los datos a partir del fichero de entrada la lista muestra las secuencias le das directamente del fiche
265. s from 3D Structure A General Model and Examples of Experimental Theoretic Scoring of Peptide Mass Fingerprints of Leishmania Proteins J Proteome Res 2009 8 9 4372 82 Santana L Gonzalez Diaz H Quezada E Uriarte E Yanez M Vina D Orallo F Quantitative structure activity relationship and complex network approach to monoamine oxidase a and B inhibitors J Med Chem 2008 51 21 6740 51 Vina D Uriarte E Orallo F Gonzalez Diaz H Alignment Free Prediction of a Drug Target Complex Network Based on Param eters of Drug Connectivity and Protein Sequence of Receptors Mol Pharm 2009 6 3 825 35 Gonzalez Diaz H Prado Prado F Ubeira F M Predicting antimicrobial drugs and targets with the MARCH INSIDE ap proach Curr Top Med Chem 2008 8 18 1676 90 Gonz lez D az H Vilar S Santana L Uriarte E Medicinal Chemistry and Bioinformatics Current Trends in Drugs Discovery with Networks Topological Indices Curr Top Med Chem 2007 7 10 1025 39 Journal of Proteome Research e Vol 9 No 2 2010 1189 technical notes 58 59 60 61 62 63 64 65 66 67 gt 68 69 70 71 72 73 74 Concu R Podda G Uriarte E Gonzalez Diaz H Computational chemistry study of 3D structure function relationships for enzymes based on Markov models for protein electrostatic HINT and van der Waals po
266. se necesitan estudios interdisciplinarios con conocimientos y m todos de los siguientes campos Qu mica Farmac utica para comprender la actividad de los f rmacos Microbiolog a y Parasitolog a para encontrar la mejor forma de luchar contra diversas patolog as Bioinform tica para manipular la informaci n biol gica Matem ticas Aplicadas con la teor a de los grafos y de las redes complejas para caracterizar num ricamente los f rmacos y sus dianas moleculares en microbios y par sitos Inteligencia Artificial y Estad stica para encontrar los modelos te ricos que pueden predecir nuevos f rmacos y sus dianas e Inform tica con t cnica de programaci n para crear las aplicaciones que pueden generar descriptores moleculares y para implementar los modelos de predicci n en herramientas online nicas en todo el mundo cient fico Las QSAR QSPR acr nimo del ingl s Quantitative Structure Activity Property Relationships han sido ampliamente utilizadas para diferentes tipos de problemas en Qu mica M dica y otras Ciencias Biol gicas Sin embargo las aplicaciones de los modelos QSAR se han limitado al estudio de pequefias mol culas en el pasado En este contexto muchos autores utilizan grafos moleculares de tomos nodos conectados por enlaces qu micos aristas para representar y caracterizar num ricamente la estructura molecular Sin embargo m s 1 recientemente han aparecido muchos modelos QSAR QSPR con aplicaciones a situaciones
267. sed methods for predicting protein properties linked with diseases and uses macromolecular graph descriptors named topological indices TIs Molecular graph theory is a branch of mathematical chemistry dedicated to encode the protein DNA RNA drug information in graph representations using Tls 18 Graphical approaches for studying biological systems can provide useful insights into protein folding kinetics enzyme catalyzed reactions inhibition kinetics of processive nucleic acid polymerases and nucleases DNA sequence analysis anti sense strands base frequencies analysis of codon usage protein networks in parasites 6 and in complicated network system research Graphic representation was also used to study the evolution of protein sequences and drug metabolism systems Particularly the wenxiang diagrams graphs This journal is The Royal Society of Chemistry 2012 were recently used to analyze the mechanism of protein protein interactions and gain some very interesting insights Interesting implementations of graph based models for drug protein and protein protein interactions are presented in Bio AIMS tools at http bio aims udc es TargetPred php Other interesting fields to apply the graph theory are the oncology and clinical proteomics A classification model for discriminating prostate cancer patients from the control group with connectivity indices was constructed by Gonz lez D az et al Vilar s group des
268. sign drug target discovery and disease biomarkers selection Noiri and Doi et al have reported that urinary FABP 1 as an early predictive biomarker of kidney injury and a liver type LIBP are included in a panel of bio markers in acute and chronic kidney disease Evennett and Petrov et al discussed that the performance of the currently available serological markers is suboptimal for routine clinical use but novel markers of intestinal ischemia such as i FABP may offer improved diagnostic accuracy Krusinova and Pelikanova reviewed adipocyte macrophage FABP A FABP that has been shown to be closely associated with metabolic syndrome obesity and development of atherosclerosis and has been recently suggested as a potential therapeutic target of these abnormalities in animal models New agents in development for the treatment of bacterial infections include LIBPs inhibitors Mol BioSyst 2012 8 851 862 851 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A LIBPs are also very relevant for different types of cancer Liver FABP L FABP is a new prognostic factor for hepatic resection of colorectal cancer metastases FABP 6 is also overexpressed in colorectal cancer and the overexpression of FABP 7 correlates with basal like subtype of breast cancer There have been studies on the fatty acid metabolism in human breast cancer cells MCF7 transfected with he
269. stical Learning Theory John Wiley and Sons New York USA 1998 C Bishop Neural Networks for pattern recognition Oxford University Press New York 1995 I Guyon and A Elisseeff J Mach Learn Res 2003 3 1157 1182 1722 Mol BioSyst 2012 8 1716 1722 73 74 75 76 TI 78 80 81 82 96 97 98 99 100 M A Hall and L A Smith Correlation based Feature Subset Selection for Machine Learning Hamilton New Zealand 1998 H Liu and R Setiono presented in part at the 13th International Conference on Machine Learning 1996 M Guetlein E Frank M Hall and A Karwath presented in part at the In Proceedings of IEEE Symposium on Computa tional Intelligence and Data Mining 2009 Y Saeys I Inza and P Larranaga Bioinformatics Oxford England 2007 23 2507 2517 K C Chou and C T Zhang Crit Rev Biochem Mol Biol 1995 30 275 349 K C Chou J Theor Biol 2011 273 236 247 C Chen L Chen X Zou and P Cai Protein Pept Lett 2009 16 27 31 M Esmaeili H Mohabatkar and S Mohsenzadeh J Theor Biol 2010 263 203 209 D N Georgiou T E Karakasidis J J Nieto and A Torres J Theor Biol 2009 257 17 26 Z C Wu X Xiao and K C Chou Mol BioSyst 2011 7 3287 3297 H Mohabatkar M Mohammad Beigi and A Esmaeili J Theor Biol 2011 281 18 23 H Mohabatkar Protein Pept Lett 2010 17 1207 1214 K C Chou Z C Wu a
270. stness to LOO data variation The present model eq 5 is simpler than the previous one eq 6 reported with the same series of ATCUN proteins The older model was fit using the electrostatic potentials k O of different orbits as described in the following equation DNA cleavage 1 15 1 2 18 m 27 57 0 27 57 Ey f 0 09N 199R 0 744 0 44p lt 0 001 6 Equation 6 shows higher percentages of good classification for ATCUN proteins but it uses four parameters which means two times more variables than the model reported in the present work In addition the best model with only two O values classifies worse than the present model with two z O In addition this model eq 5 is based on a data set of 313 proteins which is 1 5 times larger than the one used in the previous model containing 115 proteins eq 6 Other disad vantage of the previous model is the calculation of the 0 values is more complicated whereas the spectral moments x O are straightforward calculated from the traces of matrices To check the quality of our model based on complex network spectral moments we carried out some statistical analysis The Journal of Proteome Research e Vol 8 No 11 2009 5223 research articles Munteanu et al 0 8 0 7 a 2 04 05 g El 02 4 S 00 t 3 0 2 A c 03 o Q e bd 05 9 0 6 al o 10 e 4 10 Applicability domain High Leverage 20 20 0 02 0
271. sult in a person with the target condition compared to a person without the condition A DOR of 1 suggests that the test provides no diagnostic evidence Moreover we also calculated the likelihood ratios LRs which describe how many times a person with the target condition is more likely to have a particular test result than a person without that condition LRs contribute to change the probability that a target condition is present after the test has been made Binary tests have two LRs positive and negative LR LR An LR of 1 indicates no diagnostic value Since Naive Bayes needs all the variables to be independent the squared chi test was used to ensure this condition This analysis was performed using the PASW Statistics 18 statistical package version 18 0 0 Results More than 18 classification models were tested with the aim of finding the equation which is able to discriminate between proteins related to HCC The initial attributes include 40 spiral graph TIs obtained with CULSPIN 20 frequencies Fr and 20 Shannon entropies Sh Feature selection was used in order to consider the minimum number of attributes and after that the different classification methods were applied Table 1 presents the classification results for the test group and the AUROC values The classifications used only the frequencies only the Shannon entropies and both of the TIs These results were obtained using the Weka package The best QSDR classif
272. sus dianas es una actividad muy costosa y que implica mucho tiempo surge la necesidad utilizar m todos te ricos para predecir los mejores candidatos La tesis aqu propuesta plantea el desarrollo de nuevas herramientas inform ticas para el descubrimiento de f rmacos y dianas moleculares utilizando t cnicas de ingenier a inform tica e inteligencia artificial En consecuencia la informaci n estructural de las mol culas se codific en los ndices topol gicos de los grafos moleculares con la ayuda de nuevos programas inform ticos espec ficos implementados por el autor de la tesis Con estos ndices se buscaron modelos de clasificaci n capaces de predecir la actividad biol gica de nuevas mol culas o la interacci n entre mol culas Los mejores modelos desarrollados se implementaron como herramientas inform ticas Web con acceso libre para los cient ficos Todos los resultados se publicaron en revistas internacionales con importante factor de impacto JCR Abstract The search for new drugs and their molecular targets have an increased interest for the pharmaceutical industry with implications in clinical practice against complex diseases especially against microbes and parasites The experimental search of the biological activity of all possible drugs and their targets is very expensive and involves a lot of time Therefore it appears the necessity of theoretical methods to predict the best candidates The current thesis p
273. t al Journal of Theoretical Biology 317 2013 331 337 333 B Fig 2 The non embedded A and embedded B Star Graphs for 1BZ4 chain A of the star contains the same amino acid type and the star centre is a non amino acid vertex This way the following information of the protein primary structure is encoded into the Star Graph connectivity amino acid type sequence and frequency A protein can be represented by diverse forms of graphs which can be associated with distinct distance matrices The best method to construct a standard Star Graph is described subse quently each amino acid vertex holds the position in the original sequence and the branches are labelled by alphabetical order of the three letter amino acid code Randi et al 2007 The graph is embedded if the initial sequence connectivity in the protein chain is included Fig 2 presents the embedded non embedded Star Graphs of PRPS1 using the alphabetical order of one letter amino acid code Graphs are compared using the corresponding connectivity matrix distance matrix and degree matrix In the case of the embedded graph the connectivity matrices in the sequence and in the Star Graph are combined These matrices and the normal ized ones are the basis of the TIs calculation The conversion of the amino acid sequences into Star Graph TIs was performed by using the Sequence to the Star Networks S2SNet application developed by our group S2SNet is based on
274. t or output of the analysis Sequence only methods are often faster than 3D ones and need less structural information On the contrary 3D methods give a more clear idea on the structure of the protein and may be used to predict proteins with known technical notes spatial structure but unknown function 9 Alignment free methods involve topological indices signal analysis or 3D structural parameters see for instance the works of Giuliani Zbilut Kirshnan Torrens Marrero Ponce Caballero and Fernan dez Estrada Ivanciuc and others 9 6 The importance of these last methods is that these functionally nonannotated structures are becoming common in the Protein Data Bank PDB with the development of powerful characterization techniques Specifically in this work we are interested in computational methods predicting TPPIs that determine the formation of a noncovalent complex between the two proteins that can be isolated and the 3D structure chemically characterized as a potential drug target Protein complexes are essential in order to be able to understand principles of cellular organizations As the sizes of PPI networks are increasing accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes Otherwise it is the direct prediction of complexes by protein protein docking but it may become computationally expensive if we aim at perfo
275. t parasite interactions that may become impor tant targets to halt infections caused by pathogen parasites in human beings For instance the malaria parasite liver stage produces tens of thousands of red cell infectious forms within its host hepatocyte It is thought that the vacuole enclosed parasite completely depends on the host cell for successful development but the molecular parasite host cell interactions underlying this remarkable growth have remained elusive Using a yeast two hybrid screen and a yeast over expression system some authors have shown that UIS3 a parasite protein essential for liver stage development interacts directly View Online with liver fatty acid binding protein L FABP Down regulation of L FABP expression in hepatocytes severely impairs parasite growth and over expression of L FABP promotes growth This is the first identified direct liver stage host cell protein interaction providing a possible explanation for the importance of UIS3 in liver infection With these facts in mind we decided to explore the proteins present in Plasmodium sp proteome reported in PDB with known 3D structure but unknown function in order to possibly discover new LIBPs relevant to Malaria disease Considering that LIBPs as well as other LBPs are not exclusive for Plasmodium but are also present in other parasites we have used LIBP Pred to study proteins of other parasites also present in PDB but without function annotation The h
276. t test significance level of 0 1 Moreover side chain and C alpha contacts of 42 and 61 accuracy respectively as well as long and short range distant maps are automatically constructed from the thread ing alignments These data can be easily used as constraints to guide the ab initio procedures such as TASSER for further protein tertiary structure modeling The LOMETS server is This journal is The Royal Society of Chemistry 2012 Downloaded by Universidad de Vigo on 18 October 2012 Published on 10 January 2012 on http pubs rsc org doi 10 1039 C2M B05432A View Online LIBPpred Bio AIMS E Home Links About Mode 1 Standard PDBs PDB PDB chain List Please paste the ID of the PDBs PDB chains as a list oF LIBPpred LIpid Binding Proteins Prediction Tool MARCH INSIDE Python version maximum 10 items Predict Data RCSB PDB LDA classification model Accuracy 89 11 the model is based on 9 spectral moments of the proteins and the final form will be available after the publication Note The LIBP prediction is calculated using LIBPscore Min_score 100 Max_score Min_score where LIBPscore is the result of the LDA equation for the current protein and Min and Max score are the minimum and maximum values of the LIBPscore for our dataset Mode 2 LOMETS PDB Upload amp evaluate one PDB from LOMETS max 2MB Please select LOMETS PDB Browse Predict Fig 3 Web user i
277. ta upload data The input format for DRAGON should be 3D sdf You can use CORINA to convert your molecules to it Figura 3 Interfaz de la version online E Dragon 1 0 1 1 1 DRAGON El programa DRAGON http www talete mi it products dragon description htm ha sido concebido para proporcionar al usuario una variedad de descriptores moleculares incluyendo la mayoria de los TIs conocidos derivados de las diferentes representaciones moleculares Figura 2 El primer lanzamiento de DRAGON fue desarrollado en 1994 por el Grupo Milano Chemometrics con el nombre WHIM 3D QSAR Sucesivamente se han incorporado una gran cantidad de descriptores dando lugar a un nuevo software que en 1997 proporcion unos 600 descriptores y se public con el nombre de DRAGON 57 En la actualidad DRAGON v 6 0 permite el c lculo de 4855 descriptores moleculares divididos en 29 tipos y es administrado por Talete SRL una marca comercial E DRAGON v 1 0 http www vcclab org lab edragon es la versi n online de DRAGON v 5 4 Figura 3 Es gratuito y permite el c lculo de m s de 4885 descriptores moleculares que se dividen en 20 bloques l gicos 58 E Drag n ha sido desarrollado como resultado de la colaboraci n entre el Dr Tetko el profesor Todeschini y los equipos del Prof de Gasteiger Algunos ejemplos en la literatura sobre el uso de este software son 59 61 1 1 2 MoDesLab MoDesLab http www modeslab com ha sido desa
278. tados por o autor da tese Con estes ndices procur ronse novos modelos de clasificaci n que poidan predicir a actividade biol xica de novas mol culas ou a interacci n entre mol culas Os mellores modelos acadados foron implementados en ferramentas Web con acceso gratu to para os cient ficos Todos os resultados foron publicados en revistas internacionais con importante factor de impacto JCR ndice EINTRODUCCION EN E 1 1 1 Programas para par metros de grafos moleculares sssssss 6 T NN DRAGON e 7 Ll 2 MOD esl ab it A 7 LES TOMO COM D abatir 8 1 14 MARCH INSIDE orador 9 NIS NECI cocida 9 ILLO CUDESSA PRO usario tessa staal PEN PATRE QUID ERU IM BINE 10 1 2 Modelos de inteligencia artificial para f rmacos y dianas moleculares 12 1 2 1 Modelos de clasificaci n para compuestos anti virales 12 1 2 2 Modelos de clasificaci n para compuestos anti bacterianos 14 1 2 3 Modelos de clasificaci n para compuestos anti parasitarios 15 1 2 4 Modelos de clasificaci n para compuestos anti f ngicos 17 1 3 Herramientas online de clasificaci n molecular scsusssusssss 20 LX OBS nn ements UR eam ibd bp 24 2 RESULTADOS Y DISCUSI N vs 25 2 1 Nuevos programas de ordenador para los par metros moleculares 26 2 1 1 MInD Prot Descriptores Markov para f rmacos y prote
279. teins in these studies However many of these QSAR models are based on more simple numerical parameters derived from a graph or network representation of the molecular systems There are many types of graph representations but essentially they contain two elements 1 the nodes which are the parts of the system represented by a dot atoms amino acids nucleotides codons genes proteins and 2 the links between these parts represented as edges or arcs chemical bonds hydrogen bonds reactions coexpression regulation and other ties or relationships Many authors named the numerical pa rameters used to characterize a graph which are graph invariants in almost cases as Topological Indices This graphic ap proach of the biological systems study can provide useful insights in QSAR studies protein functions attributes or localiza tion protein folding kinetics enzyme catalyzed reactions 42 inhibition kinetics of processive nucleic acid polymerases and nucleases 5 DNA sequence analysis antisense strand base frequencies and analysis of codon usage 5957 Our research group used the following stochastic molecular descriptors in biochemistry and medicinal chemistry the entro pies the spectral moments the free energies 9 and the electrostatic potentials 2 All these QSAR studies are based on the Markov model MM to derive the molecular descriptors that encode the macromolecular structure The electrostatic sp
280. tential ATCUN antitumor proteins The independent data test was used by splitting the data at random in a training series 75 used for model construction and a prediction one 25 for model validation Figure 1 The initial ATCUN activity information ATCUNactiv variable has been presented in literature as the result of the experiments A previous work has reported the applicability of the LDA in QSAR studies 9 9 The best QSAR LDA model in this study was described by eq 5 and it was obtained with the Forward stepwise method from STATISTICA 9 DNA cleavage 0 36 75 f 0 05 i 7 504 N 313R 0 774 0 40h 0 058p lt 0 001 5 Journal of Proteome Research e Vol 8 No 11 2009 5221 research articles PDB files of ATCUN amp non ATCUN proteins Munteanu et al BIOMARKS tool 3D electrostatic spectral moments rm Linear Discriminant Analysis LDA STATISTICA tool Validation set Figure 1 Method flowchart for evaluation of the ATCUN DNA cleavage activity for new parasite proteins where equation elements are O values with z as the spectral moment k as the topological distance between the amino acids considered and O between brackets as the orbit of amino acids i inner t total or whole protein N represents the number of proteins selected at random from the total amount of 415 and used to train the classification function The statistical parameters of the same equation
281. tentials J Comput Chem 2009 30 1510 20 Concu R Dea Ayuela M A Perez Montoto L G Prado Prado F J Uriarte E Bolas Fernandez F Podda G Pazos A Munteanu C R Ubeira F M Gonzalez Diaz H 3D Entropy and Moments Prediction of Enzyme Classes and Experimental Theoretic Study of Peptide Fingerprints in Leishmania Parasites Biochim Biophys Acta 2009 1794 12 1784 94 Gonz lez D az H Sa z Urra L Molina R Uriarte E Stochastic molecular descriptors for polymers 2 Spherical truncation of electrostatic interactions on entropy based polymers 3D QSAR Polymer 2005 46 2791 8 Gonzalez Diaz H Molina R Uriarte E Recognition of stable protein mutants with 3D stochastic average electrostatic potentials FEBS Lett 2005 579 20 4297 301 Liu Y Beveridge D L Exploratory studies of ab initio protein structure prediction multiple copy simulated annealing AMBER energy functions and a generalized born solvent accessibility solvation model Proteins 2002 46 1 128 46 Gonz lez D az H Sanchez Gonzalez A Gonzalez Diaz Y 3D QSAR study for DNA cleavage proteins with a potential anti tumor ATCUN like motif J Inorg Biochem 2006 100 7 1290 7 Speckt D F Probabilistic Neural Networks Neural Networks 1990 3 1 109 18 Caudill M GRNN and Bear It AI Expert 1993 8 5 28 33 Buhmann M D Radial Basis Functions Theory and Implementa tions Cambridge University Press
282. teo mics 2008 8 750 78 Munteanu CR V zquez JM Dorado J Pazos Sierra A S nchez Gonz lez A Prado Prado FJ et al Proteome Res 2009 doi 10 1021 pr900556g Concu R Dea Ayuela MA Perez Montoto LG Prado Prado FJ Uriarte E Bolas Fernandez F et al Biochim Biophys Acta 2009 doi 10 1016 j bbapap 2009 1008 1020 Gonzalez Diaz H Molina R Uriarte E Bioorg Med Chem Lett 2004 14 18 4691 5 Gonzalez Diaz H Prado Prado F Ubeira FM Curr Top Med Chem 2008 8 18 1676 90 Gonz lez D az H Sa z Urra L Molina R Uriarte E Polymer 2005 46 8 2791 8 Gonz lez D az H Molina Ruiz R and Hernandez I MARCH INSIDE v3 0 MAR kov CH ains IN variants for SI mulation DE sign Windows supported version under request to the main author contact email gonzalezdiazh yahoo es 2007 Cruz Monteagudo M Gonzalez Diaz H Eur J Med Chem 2005 40 10 1030 41 Gonzalez Diaz H Aguero Chapin G Varona J Molina R Delogu G Santana L et al J Comput Chem 2007 28 6 1049 56 StatSoft Inc STATISTICA data analysis software system version 6 0 www statsoft com Statsoft Inc 2002 Marrero Ponce Y Medina Marrero R Castro AE Ramos de Armas R Gonz lez D az H Romero Zaldivar V et al Molecules 2004 9 1124 47 Ramos de Armas R Gonzalez Diaz H Molina R Uriarte E Proteins 2004 56 4 715 23 79 80 81 82 83 101 102 103 104 105 106 107 108 109 110 111 112 113 114
283. tes registros de software Conclusi n general Se puede conclur que las herramientas inform ticas basadas en t cnicas y procedimientos de ingenier a inform tica e inteligencia artificial pueden ser de gran utilidad para el descubrimiento de f rmacos y dianas moleculares 69 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 4 REFERENCIAS Gonzalez Diaz H Gonzalez Diaz Y Santana L Ubeira FM Uriarte E Proteomics networks and connectivity indices Proteomics 2008 8 4 750 78 Gonzalez Diaz H Quantitative studies on Structure Activity and Structure Property Relationships QSAR QSPR Curr Top Med Chem 2008 8 18 1554 Vilar S Cozza G Moro S Medicinal chemistry and the molecular operating environment MOE application of QSAR and molecular docking to drug discovery Curr Top Med Chem 2008 8 18 1555 72 Wang JF Wei DQ Chou KC Pharmacogenomics and personalized use of drugs Curr Top Med Chem 2008 8 18 1573 9 Caballero J Fernandez M Artificial neural networks from MATLAB in medicinal chemistry Bayesian regularized genetic neural networks BRGNN application to the prediction of the antagonistic activity against human platelet thrombin receptor PAR 1 Curr Top Med Chem 2008 8 18 1580 605 Gonzalez MP Teran C Saiz Urra L Teijeira M Variable selection methods in QSAR an overview Curr Top Med Chem 2008 8 18 1606 27 Helguera AM
284. the Tr embedded dataset and the dataset containing all the attributes The ROC curve for the Tr embedded dataset is shown in Fig 3 4 Discussion This study proposes a model designed to identify proteins that have antioxidant activity by using Star Graph TIs obtained from protein amino acid sequences The proposed model based on only five attributes extracted from the embedded graph shows good predictive capacity achieving 94 of correctly classified instances It is also important to highlight that even though the non antioxidant class was not the target class of this study the model achieves a score of 81 8 correctly classified instances with good precision 81 3 Antioxidant proteins are very important molecules in pharma cology today It can be concluded from this study that this model may help reducing the number of proteins to be tested in antioxidant research being very probable that the selected proteins have antioxidant properties Acknowledgements Vanessa Aguiar Pulido and Cristian R Munteanu acknowledge the funding support for a research position by the Plan I2C and an Isidro Parga Pondal Program both from Xunta de Galicia Spain supported by the European Social Fund The authors also want to thank the support from different proyects that has funded part of this research CN 2011 034 CN2012 127 10SIN105004PR O9SINO10105PR and TIN 2009 07707 References Ag ero Chapin G Gonzalez Diaz H Molina
285. the pairs We also tested nonlinear ANN models for comparison purposes but the linear model gives the best results We imple mented this predictor in the web server named Trypan oPPI freely available to public at http miaja tic udc es Bio AIMS TrypanoPPl php This is the first model that predicts how unique a protein protein complex in Try panosome proteome is with respect to other parasites and hosts opening new opportunities for antitrypanosome drug target discovery Keywords Trypanosoma proteome e African trypanoso miasis e Chagas disease e Markov chains e protein protein interactions e 3D electrostatic potential e protein surface e machine learning e artificial neural networks Introduction African trypanosomiasis is a vector borne parasitic disease caused by protozoan parasites of the Trypanosoma genus Trypanosoma brucei species can infect both humans and animals causing Human African Trypanosomiasis HAT also known as African sleeping sickness in man and Nagana in cattle The disease threatens over 60 million people and uncounted numbers of cattle in 36 countries of sub Saharan Africa and has a devastating impact on human health and the economy in affected areas Unless treated HAT is always fatal Political instability and economic problems are leading factors for the reduced efficacy in vector and disease control resulting in a resurgence of disease that continues to this day http www who int tdr On the other ha
286. tioxidant Proteins Journal of Theoretical Biology 317 331 337 2013 Enrique Fernandez Blanco Vanessa Aguiar Pulido Cristian R Munteanu Julian Dorado Enlace http goo gl R5vV8 Envejecimiento y calidad de vida es un tema de investigaci n importante hoy en d a en reas como las ciencias biol gicas qu mica farmacolog a etc La gente vive m s tiempo y quiere pasar ese tiempo con una mejor calidad de vida En este sentido existe un peque o subconjunto de mol culas en la naturaleza llamado prote nas antioxidantes que pueden influir en el proceso de envejecimiento Sin embargo la prueba de cada prote na individual con el fin de identificar sus propiedades es bastante cara e ineficiente Por esta raz n este trabajo propone un modelo en el que la estructura primaria de la prote na se representa mediante los gr fos de redes complejas que se pueden utilizar para reducir el numero de prote nas sometidas a ensayo para establecer su actividad biol gica antioxidante El gr fo obtenido como una representaci n te rica de una prote na ayudar a describir el sistema complejo mediante el uso de ndices topol gicos M s espec ficamente en este trabajo se han sido utilizado redes tipo estrella as como los ndices correspondientes calculados con la herramienta S2SNet Con el fin de simular la proporci n existente de prote nas antioxidantes en la naturaleza se ha creado un conjunto de datos que contiene 1999 prote nas de las
287. to different regions or intervals called gnomons or angular dispositions as one can observe in Fig 2B To define a gnomon it is necessary to remember the oblong numbers that are those that can be represented by means of the product n n 1 with natural n that is to say 2 6 12 20 30 42 56 72 90 These numbers divide the natural numbers into different intervals growing in size 2n It is easy to see that a serial couple of oblong numbers defines a gnomon and that these angular dispositions leave inserting giving place to rectangles of growing size Each element of the spiral belongs to only one gnomon Thus we can define the coordinate Un for one element from the Ulam spiral as the order number of the gnomon to which belongs When a sequence of letters is represented in its U graph each node is an element of the sequence where each letter represents the class to which this element belongs to and in each gnomon one or more different classes will exist Fig 2C CULSPIN software for spiral graph TIs CULSPIN is a new wxPython based software 9 It transforms any sequence of letters into a graphic representation that uses as template the spiral of Ulam disposition of the natural numbers in a spiral form and connects the nodes that belong to the same class they have the same letter For example the amino acid sequence GDDGGDGGGGGGGGDGGGDG DDGGGDGGGDGDGGDGDDDDGGGGGDGGDDGG GGGGGGGGGGGGGGKKKKKAAAKKAKKKKKKAA AKKKKAKKKKKAAKKKKKKK
288. tp miaja tic udc es Bio AIMS with different web server packages that may be used to predict different functions of proteins from PDB files These servers are inspired on the same philosophy of online free access and use by all the international research community as mentioned in the previous paragraph In particular the server called TargetPred package offers two new Protein QSAR servers The first ATCUNPred http miaja tic udc es Bio AIMS ATCUNPred php is available for prediction of ATCUN mediated DNA clevage anticancer proteins 68 The second server EnzClassPred is available at http miaja tic udc es Bio AIMS EnzClassPred php and can be used to predict enzyme classes from PDB files without function annotation 69 For all these reasons in this work we use the MARCH INSIDE approach for the first time to solve the problem of predicting specific pPPCs from the 3D struc ture of two proteins that may undergo pPPls or not Last but not least we implemented the predictor in a new web server named PlasmodPPI freely available to public at http miaja tic udc es Bio AIMS PlasmodPPI php In Fig 2 we depict a flowchart for all the steps taken in this work to generate the new classifiers and server 2 Materials and methods 2 1 Electrostatic entropy measures for PPIs In previous works we have used different entropy invariants derived from an MCM to describe the 3D structure of one protein backbone in structure property relationship stu
289. tros moleculares Para predecir las actividades biol gicas de los f rmacos o para buscar las dianas moleculares con modelos QSAR QSPR se necesitan n meros con el fin de caracterizar cuantificamente la relaci n entre la estructura de las mol culas y las actividades biol gicas Por ello se han desarrollado nuevos programas para ordenador capaces de calcular descriptores moleculares ndices topol gicos para f rmacos prote nas cidos nucleicos u otros sistemas reales MInD Prot similar a las funciones de MARCH INSIDE S2SNet para grafos de tipo estrella y CULSPIN para grafos de tipo espiral Las funciones MInD Prot han sido utilizadas en la implementaci n online de las herramientas presentadas en 2 2 Nuevos servidores online del Bio AIMS basados en t cnicas de ingenier a inform tica e inteligencia artificial y en consecuencia las publicaciones donde se ha utilizado este programa est n presentadas con cada servidor Los otros dos programas S2SNet y CULSPIN se presentan en tres partes las publicaciones con la aplicaci n el manual del programa y el certificado de registro general de la propiedad intelectual 2 1 1 MInD Prot Descriptores Markov para f rmacos y prote nas MInD Prot Markov Inside for Drugs and Proteins indices tipo Markov para f rmacos y prote nas es una aplicaci n programada en Python wxPython para el c lculo de los siguientes ndices tipo Markov para f rmacos y prote nas Momentos espect
290. ts the results for the best CT models found The auto matically selection of variables features was activated for all models if available In Fig 6 we illustrate the graph representation F 0 lt 0118 F 0 lt 0454 Fig 6 Structure of the CT model found Table 2 Structure of the CT LC model Parameters Parent nodes Child nodes 1 2 3 Left branch 2 4 Right branch B 5 npPPI 3395 3018 377 pPPI 581 39 542 Predicted class npPPI npPPI pPPI Split conditions LC lt Split constant LG LCi LC gt LC3 Split constant 0 011758 0 0 045360 dez m 0 000827 0 0 004075 49 s 0 000193 0 0 001044 494 t 0 005454 0 0 018150 495 t 0 004447 0 0 014544 Y Rodriguez Soca et al Polymer 51 2010 264 273 4 n 325 52 12 530 npPPI pPPI LC4 LCs 0 0 0 0 0 0 0 0 0 0 PlasmodPPI Bio AIMS 271 of the CT LC trained in this work and in Table 2 we give details about the structure of this CT and the split rules derived In particular the model CT LC is the simplest CT model found with the highest levels of Sensitivity 91 2 Specificity 98 5 and Accu racy 97 4 in the training set These values are excellent consid ering that this predictor uses only two molecular descriptors of the PPI pair which is a very complex structure in chemical terms to fit a large data set of 582 TPPIs and 3394 non TPPIs see Table 1 In fact the CT analysis yielded the best model found in this w
291. twopi neato y fdp Bloc de Notas de MS Windows XP Vista Notepad 32 S2SNet GUI1 0 py Panel principal n2one py de ncaacleresan 1 fq PATATA aci n Parametros c lculos y v sualizac n grafos S2SGf py Transformar secuencies en topological indices Tls de Graphviz twopi neato dot zirco fdp Im genes PNG con Visualizar Archivo texto con gans grafos ludas lus TIs Diagrama de flujo para S2SNet Sequence to Star Network Archivo texto zon detalles de los c lculos create and share your own diagrams at gliffy com x gliffy Figura 13 Diagrama l gica de la S2SNet Qu es la S2SNet La S2SNet transforma secuencias de caracteres en ndices topol gicos TIs de redes complejas de tipo estrella Star Network SN y visualiza los grafos resultados Figura 13 Con estos ndices se pueden realizar diversos an lisis estad sticos o crear modelos QSAR relaci n estructura propiedades Ejemplos de secuencias son las cadenas de amino cidos de las prote nas los cidos nucleicos y los espectros de masa de prote nas La S2SNet se puede utilizar para estudiar distintos sistemas desde sistemas simples de tomos en pequefias mol culas anti cancer genas hasta sistemas complejos de redes metab licas sociales computacionales o sistemas biol gicos Qu puede hacer la S2SNet Y Transformar las secuencias en ndices topol gicos de redes de tipo estr
292. ueden ser calculados a varios niveles para cada una las clases en cada gnomon de Ulam para cada una de las clases en todo el grafo y para cada gnomon independiente de las clases Por otra parte los grafos 2D Grafos U generados por la aplicaci n adem s de ser visualizados pueden ser exportados con el objetivo de poder utilizarlos en otros programas para calcular otras familias de 77s Todos los ndices num ricos se pueden guardar y o exportar y con ellos se pueden realizar diversos an lisis estad sticos o crear modelos QSAR relaci n estructura propiedades Ejemplos de secuencias son las cadenas de amino cidos de las prote nas los cidos nucleicos y los espectros de masas de las prote nas CULSPIN se puede utilizar para estudiar distintos sistemas desde los sistemas simples de tomos en peque as mol culas anti cancer genas hasta sistemas complejos de redes metab licas sociales computacionales o sistemas biol gicos Qu puede hacer CULSPIN Y Leer secuencias de letras organizadas en filas o columnas a partir de ficheros TXT Y Leer secuencias en formato FASTA almacenadas en ficheros TXT Y Leer secuencias o series num ricas organizadas en filas o columnas a partir de ficheros TXT Y Leer datos num ricos correspondientes a se ales de Espectros de Masas MS a partir de m ltiples ficheros TXT o CSV 44 Y Convertir secuencias o series num ricas y datos de MS en secuencias de letras Y Transformar
293. ult file Results 109305174346d783d5 TrypanoPPI calc txt TrypanoPPI Bio AIMS Biopython server to predict if a pair of proteins form a physically stable complex unique of Trypanosoma not present in human or other parasites based on electrostatic potential indices of Protein Protein Interactions PPIs by using MARCH INSIDE Python version and LNN 2 2 1 1 90 9 accuracy These complexes may be interesting candidates for specific anti Trypanosoma drug targets Results http bio aims udc es Results 109305174346d783d5 TrypanoPPI calc txt Calculated at 2013 04 21 20 48 14 Chainl Chain2 Complex 1HOZA 1HOZB YES 1HOZA 1F2CA NO 1K3TB 1HOZB NO 1K3TB 1F2CA NO Done Figura 30 Ejemplo de c lculo con el servidor TrypanoPPI 61 2 2 2 Plasmod PPI Interacciones prote na prote na en Plasmodium Plasmod PPI a web server predicting complex biopolymer targets in Plasmodium with entropy measures of protein protein interactions Polymer 51 1 264 273 2010 Yamilet Rodriguez Soca Cristian R Munteanu Julian Dorado Juan Rabu al Alejandro Pazos and Humberto Gonz lez D az Enlace http goo gl hRhm9 Herramienta http bio aims udc es PlasmodPPI php Ibero NBIC Network RNASA IMEDIR TIC Computer Sdence acil PlasmodPPI Q Bio AIMS E Modelling the reality Home Links About PDB chain lists Please paste the names of the PDB chains as two lists max 50 Notes There is no space between the PDB n
294. ult file Results 20939517435537c0de PlasmodPPI calc txt Plasmod PPI Bio AIMS Biopython server to predict if a pair of proteins form a physically stable complex unique of Plasmodium not present in human or other parasites based on electrostatic entropy indices of Protein Protein Interactions PPIs by using MARCH INSIDE Python version and CT 96 8 accuracy These complexes may be interesting candidates for specific anti Plasmodium anti cancer drug targets Results http bio aims udc es Results 20939517435537c0de PlasmodPPI calc txt Calculated at 2013 04 21 20 52 04 Chainl Chain2 Complex 3C5IA 3C5IE YES 2F6IE 2GHUA NO 1SYRC 1SYRF YES Done Figura 32 Ejemplo de c lculo con el servidor PlasmodPPI 63 2 2 3 ATCUNpred Prediccion de dianas proteicas con actividad ATCUN en parasitos Complex Network Spectral Moments for ATCUN Motif DNA Cleavage First Predictive Study on Proteins of Human Pathogen Parasites Journal of Proteome Research 8 11 5219 5228 2009 Cristian R Munteanu Jos M V zquez Juli n Dorado Alejandro Pazos Sierra Angeles S nchez Gonz lez Francisco J Prado Prado and Humberto Gonz lez D az Enlace http goo gl u7Thg Herramienta http bio aims udc es ATCUNPred php ATCUNPred Bio AIMS Modelling the reality ihe a LDA classification model PDB list Please paste the names of Accuracy 91 32 the PDB as a list maximum
295. unds Bioorg Med Chem 2005 13 4 1293 304 STATISTICA data analysis software system version 6 0 www statsoft com StatSoft Inc 2002 Van Waterbeemd H Chemometric methods in molecular design In Method and Principles in Medicinal Chemistry Manhnhold R Krogsgaard Larsen P Timmerman H Van Waterbeemd H Eds Wiley VCH New York 1995 Vol 2 pp 283 93 Gonz lez D az H Vina D Santana L de Clercq E Uriarte E Stochastic entropy QSAR for the in silico discovery of anticancer 5228 Journal of Proteome Research e Vol 8 No 11 2009 86 87 88 89 90 91 92 93 94 95 96 97 98 Munteanu et al compounds prediction synthesis and in vitro assay of new purine carbanucleosides Bioorg Med Chem 2006 14 4 1095 107 Atkinson A C Plots Transformations and regression An Intro duction to Graphical Methods of Diagnostic Regression Analysis Clarendon Press Oxford 1985 Eriksson L Jaworska J Worth A P Cronin M T McDowell R M Gramatica P Methods for reliability and uncertainty assessment and for applicability evaluations of classification and regression based QSARs Environ Health Perspect 2003 111 10 1361 75 Monari G Dreyfus G Local overfitting control via leverages Neural Comput 2002 14 6 1481 506 Meloun M Syrovy T Bordovska S Vrana A Reliability and uncertainty in the
296. uppl 126 S131 2 E Noiri K Doi K Negishi T Tanaka Y Hamasaki T Fujita D Portilla and T Sugaya Am J Physiol Renal Physiol 2009 296 F669 F679 T L Nickolas J Barasch and P Devarajan Curr Opin Nephrol Hypertens 2008 17 127 132 4 N J Evennett M S Petrov A Mittal and J A Windsor World J Surg 2009 33 1374 1383 5 E Krusinova and T Pelikanova Diabetes Res Clin Pract 2008 82 Suppl 2 S127 S134 6 D Abbanat B Morrow and K Bush Curr Opin Pharmacol 2008 8 582 592 Y Oka A Murata J Nishijima T Yasuda N Hiraoka Y Ohmachi K Kitagawa T Yasuda H Toda and N Tanaka et al Cytokine 1992 4 298 304 8 X Y Tang S Umemura H Tsukamoto N Kumaki Y Tokuda and R Y Osamura Pathol Res Pract 2010 206 98 101 C Buhlmann T Borchers M Pollak and F Spener Mol Cell Biochem 1999 199 41 48 U3 o 860 Mol BioSyst 2012 8 851 862 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 View Online Y A Yang P J Morin W F Han T Chen D M Bornman E W Gabrielson and E S Pizer Exp Cell Res 2003 282 132 137 Z Li C Huang S Bai X Pan R Zhou Y Wei and X Zhao Int J Cancer 2008 123 2377 2383 R J Morgan and I Soltesz Proc Natl Acad Sci U S A 2008 105 6179 6184 M L De Santis R Hammamieh
297. ur layer MLP The linear models LNN are MLP without hidden neurons The bias neurons have not been considered The minimum classification loss threshold was 1 and the classifica tion output encoding was entropy based The training algo rithms were back propagation9 in phase one with 100 epochs and learning rate of 0 01 and conjugate gradient technical notes Trypanosome Trypanosome Protein 1 PDB J Trypanosome Protein 2 PDB y MARCH INSIDE MARCH INSIDE 3D electrostatic potentials amp UR 3D electrostatic potentials ER Protein Protein Interaction Electrostatic Absolute Difference Invariant SER 18 08 SKR R orbits core c inner i middle m or surface region s Artificial Neural Network ANN Validation set Protein Protein Interactions in Trypanosome TPPI Classification model Trypanosome proteins P y TrypanoPPI Web Server LNN 2 2 1 1 based on ag s amp s and s Sals als eels Prediction of the Protein Protein Interactions for new proteins in Trypanosome Figure 2 General scheme of work with all steps necessary to develop or use the present model descendent in phase two with 500 epochs All the ANNs have been tested for one step one training period see for instance the work of Vilar et al with ANNs In Figure 3 we illustrate the graph representation of some of the ANNs trained in this work Data Set
298. uracy Total 83 9 829 MLP Sensitivity pPPI 83 1 483 98 81 9 158 35 4 4 6 6 1 1 Specificity npPPI 83 0 577 2818 81 6 200 888 Accuracy Total 83 0 817 RBF Sensitivity pPPI 18 9 110 471 202 39 154 1 1 1 1 1 Specificity npPPI 17 3 2807 588 15 5 919 169 Accuracy Total 176 162 LNN Sensitivity pPPI 92 6 538 43 90 2 174 19 4 4 1 1 Specificity npPPI 92 2 264 3131 904 104 984 Accuracy Total 92 3 90 4 acids on the surface of the two proteins forming the PPI pairs This fact indicates that the difference between the surface electrostatic entropy is very important not only for PPI interactions in general but also to discriminate the unique complex present in Plasmodium sp pPPIs and not in other organisms The model presents a good overall classification of pPPI and npPPI This level of accuracy is generally accepted by other researchers that have applied LDA to find QSAR models useful in molecular parasitology and related areas e g the works of Garc a Domenech Marrero Ponce Bruno Blanch Galvez Gozalbes and others predicting active compounds against Trypanosoma cruzi Mycobacterium avium Toxoplasma gon dii P falciparum Trichomonas vaginalis Fasciola hepatica and other parasites 96 100 see also the works of Marrero Ponce on protein and DNA RNA QSAR studies 101 103 3 2 Artificial neural network ANN models The comparison of linear and non linear models is essential to test how directly our parameters ar
299. ure parameters became a goal of major importance for drug target discovery vaccine design and biomarker selection In addition the Protein Data Bank PDB contains 3000 protein 3D structures with unknown function This list as well as new experimental outcomes in proteomics research is a very interesting source to discover relevant proteins including LIBPs However to the best of our knowledge there are no general models to predict new LIBPs based on 3D structures We developed new Quantitative Structure Activity Relationship QSAR models based on 3D electrostatic parameters of 1801 different proteins including 801 LIBPs We calculated these electrostatic parameters with the MARCH INSIDE software and they correspond to the entire protein or to specific protein regions named core inner middle and surface We used these parameters as inputs to develop a simple Linear Discriminant Analysis LDA classifier to discriminate 3D structure of LIBPs from other proteins We implemented this predictor in the web server named LIBP Pred freely available at http miaja tic udc es Bio AIMS LIBPpred php along with other important web servers of the Bio AIMS portal The users can carry out an automatic retrieval of protein structures from PDB or upload their custom protein structural models from their disk created with LOMETS server We demonstrated the PDB mining option performing a predictive study of 2000 proteins with unknown function Interesting results
300. v Chain model MCM of the intra molecular movement of electrons to calculate structural parameters of drugs In succes sive studies we have extended this method to perform fast calculation of 2D and 3D alignment free structural parameters based on molecular vibrations in RNA secondary structures or electrostatic potential and van der Waals interactions in proteins Recently the method has been renamed as MARkov CHains Invariants for Networks SImulation and DEsign MARCH INSIDE 2 0 This explores more adequately the broad uses of the method that describes the structure of drugs RNA and proteins as well as drug drug networks drug protein interactions The MARCH INSIDE may also be used to study PPIs bacteria bacteria co aggregation parasite host interactions and other systems with an MCM associated to a network In very recent reviews we have discussed the last applications of this method We should also make reference to the recent implementation carried out by Munteanu and Gonz lez Diaz of the Internet portal called Bio AIMS freely available for the use of the international research community This portal includes the web server packages TargetPred http bio aims udc es TargetPred php with new Protein QSAR servers based on MARCH INSIDE One of the servers is ATCUNPred useful for predicting ATCUN mediated DNA cleavage anti cancer proteins The second server is EnzClassPred which implements one of the MARCH INSIDE bas
301. were also shown by Wilk s statistic A canonical regression coefficient Rc leverage threshold value to define the model domain h and the model significance level p level 84 The model showed excellent accuracy in the training series and predictability in the validation series with an overall good classification of 91 32 373 out of 415 proteins The classification matrices for the training validation and both series are presented in Table 1 The model can be freely used at our Bio AIMS portal http miaja tic udc es Bio AIMS ATCUNPred php The proteins can act by diverse mechanisms with different level of effectiveness For this reason an ideal QSAR model should be based on quantitative biological activities e g IC5o Even if we do not have these values we know which proteins present a certain biological activity and which of them do not show any activity The advantage of using LDA against the regression technique is to be the first method acting as a pattern recognition technique that identifies potentially active proteins and gives a score for the probability of the presence of such activity without predicting how high this probability 5222 Journal of Proteome Research e Vol 8 No 11 2009 Table 1 OSAR Classification Results for Training Validation and Both Series train CV both HE CE m MM Spectral moments n 415 ATCUN 74 2 72 25 71 1 27 11 733 99 36 Nonactive 100 0 0 21
302. wxPython Rappin and Dunn 2006 for the GUI application and has Graphviz Koutsofios and North 1993 as a graphics back end The present calculations are characterized by embedded and non embedded Tls no weights Markov normalization and power of matrices indices n up to 5 The results file contains the following TIs Todeschini and Consonni 2002 Trace of the n connectivity matrices Trn Tr X Mni 1 where n O power limit M graph connectivity matrix i i dimension ii ith diagonal element Harary number H H RA lt Mij dij 2 where dj are the elements of the distance matrix and mj are the elements of the M connectivity matrix Wiener index W W dij 3 i lt j Gutman topological index Sg S gt des x degj di 4 where deg are the elements of the degree matrix Schultz topological index non trivial part S S 3 a deg deg x di 5 Balaban distance connectivity index J J edges nodes 2 x Ny Mix sqrt D dis x d 6 where nodes 1 AA numbers node number in the Star Graph origin k di is the node distance degree Kier Hall connectivity indices X Ox P sqrt deg i 7 X So j My x Mu sart deg x deg x deg 8 X Y agen e Mix x M4m sqrt deg x deg x deg x deg 9 Ay _ X i cjckemeolMi X Mik X Mem x Mmno sqrt deg x deg x deg x deg x deg 10 IIA X Mix X Mim X Mmo x mos sqrt deg x deg x deg x deg x deg x deg 1
303. y of antibacterial drugs Eur J Med Chem 2009 44 11 4516 21 Prado Prado FJ Gonzalez Diaz H de la Vega OM Ubeira FM Chou KC Unified QSAR approach to antimicrobials Part 3 first multi tasking QSAR model for input coded prediction structural back projection and complex networks clustering of antiprotozoal compounds Bioorg Med Chem 2008 16 11 5871 80 Prado Prado FJ Ubeira FM Borges F Gonzalez Diaz H Unified QSAR amp network based computational chemistry approach to antimicrobials II Multiple distance and triadic census analysis of antiparasitic drugs complex networks J Comput Chem 2009 Prado Prado FJ Garcia Mera X Gonzalez Diaz H Multi target spectral moment QSAR versus ANN for antiparasitic drugs against different parasite species Bioorg Med Chem 2010 18 6 2225 31 Gonzalez Diaz H Prado Prado FJ Santana L Uriarte E Unify QSAR approach to antimicrobials Part 1 predicting antifungal activity against different species Bioorg Med Chem 2006 14 17 5973 80 Gonzalez Diaz H Prado Prado FJ Unified QSAR and network based computational chemistry approach to antimicrobials part 1 multispecies activity models for antifungals J Comput Chem 2008 29 4 656 67 Prado Prado FJ Borges F Perez Montoto LG Gonzalez Diaz H Multi target spectral moment QSAR for antifungal drugs vs different fungi species Eur J Med Chem 2009 44 10 4051 6 Shen HB Chou KC Virus mPLoc a fusion classifier for viral protein subcellu
304. z A mi hijo Tudor Agradecimientos Esta Tesis Doctoral ha sido realizada en el Tecnolog as de la Informaci n y las Comunicaciones Facultad de Inform tica Universidade da Corufia bajo la supervisi n del Dr Alejandro Pazos Sierra y Dr Humberto Gonz lez D az a los cuales me gustar a agradecer la inestimable ayuda que me ha prestado Tambi n agradezco toda la ayuda que me han prestado los colaboradores de la Universidad de Santiago de Compostela especialmente a Dr Francisco Prado Prado y para el suporte inform tico brindado por el grupo de Redes de Neuronas Artificales y Sistemas Adaptativos Universidade da Corufia especialmente a Juli n Dorado Vanessa Aguiar Pulido y Dr Marcos Gestal Pose Quisiera extender estos agradecimientos tambi n a los profesores que me formaron como cient fico Matei y Florentina Ion Hillebrand Mihaela Domnina Razus y Berta Fern ndez Rodr guez Quiero hacer una menci n especial a mi familia y a mis amigos sin cuyo esfuerzo y apoyo no habr a sido posible que hubiese llegado hasta aqu jMuchas gracias a todos ER Do CH Cristian OR A t tectontt Resumen La b squeda de nuevos f rmacos y sus dianas moleculares tiene mucho inter s en la industria farmacol gica con implicaciones en pr ctica cl nica contra enfermedades complejas especialmente contra los microbios y par sitos Como la b squeda experimental de la acci n biol gica de todos los f rmacos posibles y de
Download Pdf Manuals
Related Search
Related Contents
LG Universal Kit User's Manual Copyright © All rights reserved.
Failed to retrieve file