Home

MOLGEN– QSPR User Guide

image

Contents

1. 3 7 3 Definitions of Electrotopological and Al indices 3 7 4 Definitions of Geometrical Indices 3 7 5 Definitions of Miscellaneous Indices 3 7 6 Definition of Overall indices 3 8 Referen cestas g s IN 4 Literature on MOLGEN QSPR CONTENTS Introduction The software package MOLGEN QSPR provides methods for the study of quantitative structure property relationships QSPRs and the prediction of property values for com pounds in virtual combinatorial libraries Figure 1 shows a simplified flowchart of QSPR search and application Virtual library structures only Real library structures and properties AICA YA Property Structural values formulae Descriptor computation Structural formulae Descriptor computation Descriptor values Descriptor values Supervised statistical learning regression classification Prediction function Application of prediction function Predicted property values for virtual library promising candidates for synthesis Figure 1 Flowchart of QSPR search and application The input of MOLGEN QSPR is a set of chemical compounds given as molecular graphs together with values for a continuous target variable representing the physico chemical property under consideration In the following tutorial we will treat the boiling points of decanes as an example The QSPR search con
2. Figure 2 19 Regression dialogue Regression Setup Figure 2 20 Regression Variables page 2 5 REGRESSION ANALYSIS 21 Regression Setup x Method Preprocessing Variables Property preprocessing none y Descriptor preprocessing auto scaling y i Cancel Figure 2 21 Regression Preprocessing page For both kinds of variables there are five types of preprocessing available e none e centering the shift of variable values by their arithmetic mean e range scaling transforming the variable values in such a way that they range from 0 to 1 e auto scaling transforming the variable values in such a way that they have mean 0 and variance 1 e normalization which divides the variable values by their euclidean norm i e after transformation they have euclidean norm 1 All these preprocessings are linear transformations As such they do not influence least squares regression and regression trees However for neural networks support vector machines and nearest neighbor regression variable preprocessing may have an important impact on model quality If such a transformation is applied it is automatically reversed in a final step 2 5 3 Regression Method Clicking on the Method tabulator field you obtain a page for setting up the regression method Figure 2 22 22 CHAPTER 2 TUTORIAL Regression Setup x Method Preprocessing Variables Algorithm least squares regression
3. multiple linear least squares regression using GR decomposition Argument Value i Cancel Figure 2 22 Regression Method page Use the Algorithm combo box in order to select the regression algorithm to be applied There are various algorithms available among them least squares regression e regression trees e neural networks e support vector machines and e nearest neighbor regression Note In order to use regression trees neural networks or support vector machines the statistics software R must be installed cf Section 1 1 2 For the ordinary least squares regression no further arguments are required Often you will use the best subset regression Figure 2 23 Using the Argument and Value combo boxes parameters for the regression algorithm can be defined A short description of the algorithm and the argument is displayed 2 5 4 Starting the QSPR Calculation After regression setup is completed close the Regression Setup sheet with OK and start the regression algorithm by clicking the Start button After a while the regression analysis will be finished and results will be displayed in the Output field Figure 2 24 2 5 REGRESSION ANALYSIS 23 Regression Setup Xx least squares regression all k subsets Figure 2 23 Regression Method page for best subset regression Regression J Use remaining as testset linear model R 2 0 958138 5 1 60149 F 201 41
4. Crippen atom types HO1 H04 Crippen atom types O01 012 Crippen atom types N01 N14 Crippen atom types Hal Cl Br Crippen atom types I F P Crippen atom types S01 S02 S03 Crippen atom types Me01 Me02 42 CHAPTER 3 THE MOLECULAR DESCRIPTORS 3 6 Overall Indices 0 87 Her RPC TC TOR 28 2 o LC OG an TOS OTM TM TM TM AP MSs ie TM TMS TMS ITW TW PC TG LC PES STO TG 3T Mi p T M1 p T Mi ts T M IM oa T M p ST Mb eA T M2 STW TW SEO TCe 8K TC TM STM TM TM FW RO CH an SEO gt T M ZN gt T Ma p TM TW TC sum of numbers of subgraphs of order 0 through 8 number of subgraphs of order 0 8 overall connectivity order 0 6 overall connectivity overall connectivity subgraph order 1 6 overall connectivity subgraph overall valence connectivity order 0 6 overall valence connectivity overall first Zagreb order 0 6 overall first Zagreb overall first Zagreb subgraph order 1 6 overall first Zagreb subgraph overall second Zagreb order 1 6 overall second Zagreb overall second Zagreb subgraph order 1 6 overall second Zagreb subgraph overall Wiener order 1 6 overall Wiener overall connectivity order 3 6 path overall connectivity path overall connectivity subgraph order 3 6 path overall connectivity subgraph path overall valence connectivity or
5. Ig lt Ic i e moments of inertia for rotation about three mutually perpendicular axes oriented such that one of the moments is a maximum another one a minimum Shadows SHDW1 SHDW2 and SHDW3 mean the areas of the projection of the molecular surface onto the planes XY XZ and YZ respectively They are 3 7 DEFINITIONS OF DESCRIPTORS 63 called the XY shadow the XZ shadow the Y Z shadow X Y and Z axes are the molecule s principal axes of inertia 55 57 From these indices we obtain the descriptors HDW1 HDW2 HD Sa a PTE pe 200 Lz Ly llo Ey Ez where Lr Ly and L are the maximal dimension of the molecular surface in X Y and Z direction using vdw radii They are called the standardized XY XZ and YZ shadow We also introduce the quotients SHDWi ae 1 2 3 lt j SHDW 480133 4 J These quotients are the XY XZ shadow etc Moreover we introduce the size sorted shadows ssSHDW1 ssSHDW2 ssSHDW3 of which ssSH DW 1 is the largest ssS H DW is the second largest and ssSHDW3 is the smallest The prefix ss stands for size sorted In addition we have the size sorted standardized shadows SHDW1 SHDW2 SHDW3 EDWIN a a Lz Ly Dar by Ly L and the quotients 3 ssSH DWi 1 J 1 231 ex 7 ssSHDW j e ted 5 Van der Waals volume Vodw density Pudw Vin and Ve are calculated for vdw molecules including H atoms Vodw is the volume of the molecule evaluated by using vdw radii for eac
6. and finally property prediction 2 1 Data Input 2 1 1 Importing Structural Formulas There are several possibilities to import electronically stored chemical structures For our first example we import a library of 50 decanes stored as MDL SDfile on the MOLGEN QSPR CD 1 Click on File Import to get to the Import File dialogue 2 Select SDfiles sdf in the Filetype combo box 3 Click on DecanesReal sdf in order to select the desired SDfile 4 Use the Open button to open the selected file The 50 decanes the real library will now be displayed as Molecule document on the screen Figure 2 1 There are various functions and controls available to modify the layout of structures for instance e View Hydrogens to display hydrogen atoms e View Symbols to display element symbols 5 6 CHAPTER 2 TUTORIAL MOTTE DecanesReal File Edit View Window Help DEU ts BBx oal Figure 2 1 Molecule document containing 50 decanes 2 1 DATA INPUT 7 Figure 2 2 Molecular Descriptors document containing 50 boiling points e Start Molecule combo box and the scrollbar to navigate through the library e Rows and Columns combo boxes to change the grid etc 2 1 2 Importing Property Values The next step in a QSPR study is to supply property values for the structures In this example property values are stored in a tabulator separated ascii table Such a file is structured in the following way The first line contai
7. 1 Calculating the Correlation Matrix 18 2 4 2 Displaying Correlations 6 3 22 ON 228 SH aaa ae at 18 Zor Regression Analysis au Es ie he ee 19 iii 2 5 1 Variable Selection Sood Hees Ray HE 2 5 2 Regression Preprocessing 2 5 3 Regression Method oaaao 2 5 4 Starting the QSPR Calculation 2 6 Displaying and Saving QSPRs 2 6 1 QSPR Common Properties 4 2 2 244 22 0 34 20 2 OSE Details x etats elt 1 ae ae rue Be 20 3 QSPR Descriptors unes Lun bod ee Schad 8a Aue 20 4 ASP Ry Property tox bial A a Pg 230 92 o 2 27 28 2 22 22 ae ae rue Be 2 6 6 QSPR Predictions 2 uen 34 S24 2 Asa Ba Ai O Plot were et eg rar 227 A SN GENS tele are he 2 7 1 LOO Crossvalidation 42 4 24 23 ara Ba Lu 2 7 2 Further Validation Sessel er 2 8 Property Prediction aaa a AAA 2 8 1 Generating a Virtual Library 2 8 2 Comparing Real and Virtual Library 2 8 3 Applying QSPRs for Prediction 3 The Molecular Descriptors 31 Arithmetic Indices 4 2 Leu ee a es 3 2 Topological Indices lr aa A Sch x Bead 3 3 Electrotopological and Al Indices 3 4 Geometrical Indices sr ra pta ta 3 5 Miscellaneous Indices 3 0 Overall Indices sers ri Lea Mes oe te A 3 7 Definitions of Descriptors eo I a dS aaa 3 7 1 Definitions of Arithmetic Descriptors 3 7 2 Definitions of Topological Indices
8. Al ssssB Al sssPbH AI tN AI ssS AI sSiH3 Al ssssPb 62 CHAPTER 3 THE MOLECULAR DESCRIPTORS where s means a single bond ss two single bonds d a double bonds t a triple bond a an aromatic bond etc to the specified atom not counting bonds to H atoms specified 63 66 Xu indices are defined as follows 67 The Xu index is a Xu VA log Leia di OF while the modified Xu index is mod m2 el A log 2 E4 0 pe 1 ret 0 3 7 4 Definitions of Geometrical Indices 1 The steric energy st energy is calculated by molecular mechanics in MOLGEN it is the target quantity minimized thereby All other descriptors appearing in this subsection depend on geometry that is on the particular conformer obtained in such optimization Gravitational Indices 3D dist 33 Using the geometrical distance expressed in Angstr m A of atoms i and j we find the indices A incl H A incl H A A Rz Wj and G incl H gt gt ae en j i 1 de Again the summation runs in the first case over all pairs of atoms in an H suppressed molecular graph while in the second case H atoms are included If only bonded pairs are considered the following indices are obtained without and with consideration of bonds to H atoms G Y EL and Gofind H Y 4 pe ro edge i j Y edge i j 4 ge 2 J Principal moments of inertia 7 18 1c are the three principal moments of inertia of the molecule with 74 lt
9. H suppressed molecular graph The maximal entry in its i th row is called eccentricity of atom i ni max D 1 lt 7 lt A 3 7 DEFINITIONS OF DESCRIPTORS 47 The vertex distance degree o is defined as the i th row sum of the distance matrix D of an H suppressed molecular graph 0 gt De j e The unsaturated distance matrix D D the rows and columns of which correspond to the non H atoms The entry De is the length of the shortest path from atom 7 to atom j where single bonds represent a distance of 1 double bonds represent a distance of 1 2 triple bonds represent a distance of 1 3 aromatic bonds represent a distance of 2 3 Here is an example In this example the distance Dae from a to cis 1 1 2 3 2 and the distance Daa 1 1 2 1 2 2 The unsaturated vertex distance degree 6 is defined as the i th row sum of the unsaturated distance matrix D of an H suppressed molecular graph j e The charge term matrix CT CT a square matrix the rows and columns of which correspond to the non H atoms 6 ifi 3 CT Mij Mi otherwise where M is defined as M A D 2 and 1 DOP Pu ifi j 0 otherwise 48 CHAPTER 3 THE MOLECULAR DESCRIPTORS e The detour matrix A A the rows and columns of which correspond to the non H atoms The entries are the lengths of longest paths between atoms 0 ifi j Rue l otherwise where is the length of the longest path b
10. Of course there exist various alternatives to supply data for QSPR studies and MOLGEN QSPR offers several other ways for data import Among these are e Edit structures with the built in structure editor MOLED use File New Moled to draw a molecular structure as a molfile e Import structures from several MDL Molfiles use File New Molecules and then File Append e Import structures and property values from CODESSA input files Use File Import and select an inp file e Add and edit property values within an existing Molecular Descriptors document see Subsections 2 2 2 and 2 2 3 2 2 Displaying and Editing Data Before starting the molecular descriptors calculation we will have a closer look at some functionality of the Molecular Descriptors document 10 CHAPTER 2 TUTORIAL Items 50 Selected 21 Columns 1 Figure 2 6 Selection of rows with bps between 150 and 160 C 2 2 1 Displaying Structural Formulas As already mentioned rows can be sorted by property values If we want to have a look at the decanes of our real library with bps above 150 C and below 160 C we have to conduct the following steps 1 Click on the bp column head to sort rows by ascending bps 2 Use the left mouse button to select all rows with bps between 150 and 160 Figure 2 6 3 File Pass Values will cause the values of the current column to appear as names in a new Molecule document containing the selected structures
11. Start button to start descriptor calculation 4 When the calculation is finished click OK to return to the Molecular Descriptors document After descriptor calculation descriptor values will appear in additional columns Figure 2 10 2 3 2 Calculating Substructure Counts A second type of molecular descriptors are substructure counts A substructure is a part of the hydrogen suppressed molecular graph The substructure procedure implemented in MOLGEN QSPR systematically finds all substructures up to a certain size that occur in a molecular library and counts their occurrences in all molecules in the library For example in 2 fluorobutane H3C CHF CH2 CHa3 the substructures F C F C C F C C C F and C C F C will automatically be retrieved and counted along with fluorine free substructures Starting from the Molecular Descriptors document 14 substructures CHAPTER 2 TUTORIAL Substructure Counts x Input Molecules C All Selected Substructures Minimum Edges jo gt Maximum Edges Jinf y Only Peripheric ape r Output M Delete Unique Substructures M Delete Nonvariant Substructures OK M Show Substructures Cancel Figure 2 11 Substructure Counts dialogue call File Substructure Counts to obtain the Substructure Counts dialogue Figure In the Minimum Mazimum Edges combo boxes specify the lower and upper number of edges for the substructures to be retri
12. and a B while 4 2 denotes the biggest integer smaller than or equal to 4 2 the Hosoya index is LA 2 Z D Qk k 0 Basak Information Contents In order to obtain information content indices Basak partitions the atoms of a molecule including H atoms into equivalence classes Two atoms are considered equivalent if the numbers and atom types chemical ele ments of and the bond types to all their neighbors coincide up to the neighborhood depth r If for depth r G equivalence classes are found then the number of atoms 54 in the g th class is written as A defined as CHAPTER 3 THE MOLECULAR DESCRIPTORS q and the information content of order r 1C is r T G IC ERA a e AA Ds A inel H 082 Afincl H g 1 The descriptors TIC CIC SIC and their multiples N CIC N SIC N BIC for r 0 1 2 are defined as TIC Alinel H IC CIC log Alincl H IC N CIC A inel H CIC IC log A incl H N SIC Al inel H SIC IC log B incl H N BIC Alincl H BIC SIC BIC Note This definition of BIC is the original one The indices carry the following names The index 35 37 its name IC TIC CIC N CIC SIC N CIC BIC N BIC Basak information content of order r Basak total information content of order r Basak complementary information content of order r total complementary information content of order r Basak structural informati
13. atoms in that path For example the Randi indices of order 0 and 1 are On es d v 1 Nie Deh gy tre or dge i j where the sum is taken over the vertices and the edges in an H suppressed molecular graph respectively 5 Solvation connectivity indices They form the series of indices y with m 0 1 2 3 defined by A 3 ha 1 L X om l gt gt V i 2 path p of length m i where the product is taken over the atoms in the path and L is the principal quantum number of atom i 2 for C N O F 3 for Si P S Cl etc 11 6 Solvation connectivity index for clusters This index arises by taking the sum over all clusters of size 3 which means subgraphs of the following form 50 CHAPTER 3 THE MOLECULAR DESCRIPTORS The index is defined by 1 L Mr I cluster of size 3 i l y a Kier and Hall or valence connectivity indices These form the series y m 0 1 2 3 and are defined as follows amp 11 A p 1 x gt Jo i path p of length m i 1 the valence vertex degree or vertex valence of atom i in an H suppressed molecular graph is defined as a Z hj Ve Zi 2 1 where Z is the total number of electrons the atomic number of atom i Z the number of valence electrons h the number of H atoms attached to atom 7 In MOLGEN QSPR these indices are implemented for m 0 1 2 3 Kier shape indices 1
14. bonds e g the methyl group and defined as 2 1 8 1 I i The second term stands for the sum of influences of all other atoms j in the molecule on atom i where L I Thus S characterizes a particular non H atom e g a particular methyl group in the ethyl acetate molecule In MOLGEN QSPR the sum of E state values of all such atoms is available e g the sum of E states of all methyl groups in a molecule called S sC H3 which in the case of ethyl acetate is the sum of E states of the two methyl groups Here is a table of the 80 available sums of E states of atomic subgraphs S sCH3 S sssNH S aaS S ssSiH2 S dCH2 S dsN S dssS S sssSiH S ssCH2 S aaN S ddssS S ssssSi S tCH S sssN S ssssssS S sGeH3 S dsCH S ddsN S sCi S ssGeH2 S aaCH S aasN S sSeH S sssGeH S sssCH S ssssN S dSe S ssssGe S ddC S sOH S ssSe S sAsH2 S tsC S dO S aaSe S ssAsH S dssC S ssO S dssSe S sssAs S aasC S aaO S ddssSe S sssdAs S aaaC S sF S sBr S sssssAs S ssssC S sPH2 S sI S sSnH3 S sNH3 S ssPH S sLi S ssSnH2 S sNH2 S sssP S ssBe S sssSnH S ssNH2 S dsssP S ssssBe S ssssSn S dNH S sssssP S ssBH S sPbH3 S ssNH S sSH S sssB S ssPbH2 S aaNH S dS S ssssB S sssPbH S tN S ssS S sSiH3 S ssssPb where s means a single bond ss two single bonds d a double bonds t a triple bond a an aromatic bond etc to the specified atom disregarding bonds to H atoms spe
15. from the CD ROM However it is useful to copy the program and the sample files on your hard disc Proceed as follows 3 4 CHAPTER 1 FIRST STEPS x Your windows product id is 55372 0EM 001 1903 00126 Enter your license key here Es a Figure 1 1 License dialogue 1 Insert the MOLGEN QSPR installation CD ROM into your CD ROM drive 2 Copy the complete folder MOLGEN QSPR into the Programs directory of your hard disc drive This is located for instance at C Program Files 3 Optionally create shortcuts to your desktop or your start menu 1 3 Activation After you first start MOLGEN QSPR the License dialogue Figure 1 1 will be displayed Please send your windows product id to molgen molgen de You will receive a license key for activation 1 4 Demo For evaluation purposes a free demo license can be ordered In case you received such a demo version no license key will be required The demo license offers full functional ity for calculating QSPRs However import functions are limited Only the input files DecanesReal sdf and DecanesReal tat delivered with the demo version can be imported Structure generators are not accessible in the demo version Chapter 2 Tutorial This part of the MOLGEN QSPR User Guide gives a brief description of all you need to know for your first QSPR calculations It is described step by step beginning with data input followed by descriptor calculation regression analysis
16. the screen could look as shown in Figure 2 25 In a QSPR document different types of QSPRs for different properties using different descriptors and algorithms can be stored Use File Save As in order to save the QSPR document extension gspr With the View submenu you can add hide columns with certain characteristics of the QSPRs such as e model type e property name e number of descriptors e degrees of freedom 2 6 DISPLAYING AND SAVING QSPRS 25 Properties x Common Details Descriptors Property Model Predic_4 J linear model i for prediction of BP Name linear model R 2 0 958138 5 1 60149 F 2 Author Administrator Date 03 06 2004 22 01 57 Comment i Cancel Figure 2 26 QSPR Common page e number of observations e R squared e standard error e Fisher s F value e residual sum of squares e mean squared residual e mean absolute residual e maximum absolute residual etc Doubleclick on a certain QSPR to get the QSPR s property sheet Figures 2 26 2 32 2 6 1 QSPR Common Properties On the Common page you are given the information shown in Figure 2 26 This informa tion can be edited and stored using the OK button 2 6 2 QSPR Details Statistical details are supplied on the Details page Figure 2 27 26 CHAPTER 2 TUTORIAL Descriptors Property Model Predic A gt Number of observations 50 Number of descriptors 5 Degrees of freedom 6 00000
17. 2 and 3 These are arithmetic expressions in terms of the number A of atoms and numbers P of paths of length l in the molecular graph of the H suppressed molecule 124 1 ALA a A 1 A 2 Jr for even A A gt 3 p A SS er nr CP CP ADM for odd A gt 3 GPP Note that P B the number of bonds Alpha modified Kier shape indices 1 2 and 3 21 15 These are A A a A a 1 A a 1 44 0 2 La CP rap 7 ot CP ap and 2 A for even A A gt 3 Ka te De for odd A gt 3 The modifying a is defined as follows a i 1 Rosp 3 7 DEFINITIONS OF DESCRIPTORS ol 10 11 12 13 where R is the covalent radius of the i th atom in an H suppressed molecule and Rosp is the covalent radius of an sp carbon atom Here is a table with such values Atom Hybrid i Ri a Atom Hybrid Qi Cx 0 77 0 00 Pa 1 10 0 43 Ca 0 67 0 13 Papa 1 00 0 30 ce 0 60 0 22 ae 1 04 0 35 Nas 0 74 0 04 oR 0 94 0 22 Nis 0 62 0 20 F 0 72 0 07 Nep 0 55 0 29 Cl 0 99 0 29 Os 0 74 0 04 Br 114 0 48 Os 0 62 0 20 I 1 33 0 73 Kier molecular flexibility index alpha modified and non modified ARS Lara and 9 A A Platt number It is expressed in terms of the numbers N i of neighbors of atoms F Y N N0 2 edge i 5 The sum runs over all edges in the H suppressed molecular graph 18 Gordon Scantlebury index Vas is the
18. 4 Use File Molecules to create the said new Molecule document of selected structures Figure 2 7 2 2 2 Editing Property Values Often it is necessary to edit some property values after data import To do so proceed as follows 2 2 DISPLAYING AND EDITING DATA 11 7 Molgen Selection of DecanesReal 0 x EX Fie Edit View Window Help 8 x Denen o 8 Start Molecule y 1 Rows y 4 Columns He 6 AtomSize Fork Size rh ay fea IESE ole Apr ab Molecules 21 Selected 0 4a Oli 28 28 28 o Aw a aw Figure 2 7 Structures with bps between 150 and 160 C Current Column y Current Entry 147 0000 y 7 145 0000 146 0000 F 22 s 35 147 0000 E 44 147 6000 s 36 147 7000 Figure 2 8 Editing property values using the Current Entry combo box 1 Select the property column you want to edit by clicking the column head or using the Current Column combo box 2 Select the row of the property value you want to edit The Current Entry combo box becomes activated and the selected property value appears Figure 2 8 3 Edit the property value in the Current Entry combo box The value is immediately transferred to its place in the Molecular Descriptors document 2 2 3 Further Edit Operations There are some further operations available to modify a Molecular Descriptors document Selected row s can be deleted using Edit Delete To delete a column make it t
19. 6 linear model R 2 0 957105 5 1 62113 F 196 354 linear model R 2 0 956218 5 1 63781 F 192 197 linear model R 2 0 95557 S 1 64988 F 189 266 linear model R 2 0 955354 5 1 65389 F 188 307 linear model R 2 0 95529 5 1 65508 F 188 022 M New QSPR Document Figure 2 24 Regression dialogue with results in the Output field 24 CHAPTER 2 TUTORIAL 0 958138 1 60149 F 201 416 linear model R 2 0 958103 5 1 60217 F 201 239 linear model R 2 0 957105 5 1 62113 F 196 354 linear model R 2 0 95704 5 1 62235 F 196 043 linear model R 2 0 956218 S 1 63781 F 192 197 linear model R 2 0 996033 5 1 64127 F 191 351 linear model R 2 0 95557 S 1 64983 F 189 266 linear model R 2 0 95529 1 65508 F 188 022 linear model R 2 0 955212 1 65652 F 187 681 linear model R 2 0 955169 5 1 65731 F 187 493 3323839399383 un un un un Figure 2 25 QSPR document In the Output field you see the best QSPRs calculated one in each row Doubleclick on a certain QSPR to obtain further details on the selected QSPR Use the Add Predic tons Residuals check boxes to add values calculated by the QSPR and or residuals as new column s to the Molecular Descriptors document If the Add Models check box is activated QSPRs are added to a new or an existing QSPR document specified by the lower combo box 2 6 Displaying and Saving QSPRs If you decided to add models to a new QSPR document
20. 695 30 RUCKER G RUCKER C Walk Counts Labyrinthicity and Complexity of Acyclic and Cyclic Graphs and Molecules J Chem Inf Comput Sci 2000 40 99 106 31 GUTMAN I RUCKER C RUCKER G On Walks in Molecular Graphs J Chem Inf Comput Sct 2001 41 739 745 32 NIKOLIC S TRINAJSTI N Toxic I M RUCKER G RUCKER C On Molecular Complexity Indices Chapter 2 pages 29 89 in Complexity in Chemistry Bonchev D Rouvray D H Eds Taylor and Francis London 2008 33 KATRITZKY A R Mu L LOBANOV V S KARELSON M Correlation of Boiling Points with Molecular Structure 1 A Training Set of 298 Diverse Organics and a Test Set of 9 Simple Inorganics J Phys Chem 1996 100 10400 10407 34 Hosoya H Topological Index A Newly Proposed Quantity Characterizing the Topological Nature of Structural Isomers of Saturated Hydrocarbons Bull Chem Soc Jpn 1971 44 2332 2339 35 BASAK S C Information Theoretic Indices of Neighborhood Complexity and Their Applications Chapter 12 in Topological Indices and Related Descripors in QSAR and QSPR Devillers J Balaban A T Eds Gordon and Breach Amsterdam 1999 36 BASAK S C Use of Molecular Complexity Indices in Predictive Pharmacology and Toxicology A QSAR Approach Med Sci Res 1987 15 605 609 37 BASAK S C GUTE B D Characterization of Molecular Structures Using Topo logical Indices SAR QSAR Environ Res 1997
21. 7 1 21 38 IVANCIUC O BALABAN A T Design of Topological Indices Part 8 Path Matri ces and Derived Molecular Graph Invariants MATCH Commun Math Comp Chem 1994 30 141 152 70 CHAPTER 3 THE MOLECULAR DESCRIPTORS 39 Amic D TRINAJSTI N On the Detour Matrix Croat Chem Acta 1995 68 93 62 40 Lukovrrs I The Detour Index Croat Chem Acta 1996 69 873 882 41 Lukovrrts I RAZINGER M On Calculation of the Detour Index J Chem Inf Comput Sci 1997 37 283 286 42 RUCKER G RUCKER C Symmetry Aided Computation of the Detour Matrix and the Detour Index J Chem Inf Comput Sct 1998 38 710 714 43 RANDIG M BRISSEY G M SPENCER R B WILKINS C L Search for All Self Avoiding Paths for Molecular Graphs Comput amp Chem 1979 3 5 13 44 RANDIG M Characterization of Atoms Molecules and Classes of Molecules Based on Paths Enumeration MATCH Commun Math Comp Chem 1979 7 5 64 45 GALVEZ J GARCIA R SALABERT M T SOLER R Charge Indexes New Topologcal Descriptors J Chem Inf Comput Sci 1994 84 520 525 46 GALVEZ J GARCIA DOMENECH R DE JULIAN ORTIZ V SOLER R Topo logical Approach to Drug Design J Chem Inf Comput Sci 1995 35 272 284 47 WILDMAN S A CRIPPEN G M Prediction of Physicochemical Parameters by Atomic Contributions J Chem Inf Comput Sci 1999 39 868 878 48 BONCHEV D Novel Indices for the Topolog
22. CKER G RUCKER M MERINGER y Randomization and Its Variants in QSPR QSAR J Chem Inf Model 47 2007 2345 2357 A KERBER R LAUE M MERINGER C RUCKER Molecules in Silico A Graph Description of Chemical Reactions J Chem Inf Model 47 2007 805 817 C RUCKER M SCARSI M MERINGER 2D QSAR of PPARy Agonist Binding and Transactivation Bioorg Med Chem 14 2006 5178 5195 C RUCKER M MERINGER A KERBER QSPR Using MOLGEN QSPR The Chal lenge of Fluoroalkane Boiling Points J Chem Inf Model 45 2005 74 80 J BRAUN A KERBER M MERINGER C RUCKER Similarity of Molecular Descrip tors The Equivalence of Zagreb Indices and Walk Counts MATCH Commun Math Comput Chem 54 2005 163 176 C RUCKER M MERINGER A KERBER QSPR Using MOLGEN QSPR The Example of Haloalkane Boiling Points J Chem Inf Comput Sci 44 2004 2070 2076 A KERBER R LAUE M MERINGER C RUCKER MOLGEN QSPR a Software Package for the Study of Quantitative Structure Property Relationships MATCH Com mun Math Comput Chem 51 2004 187 204 M MERINGER Mathematische Modelle fiir die kombinatorische Chemie und die moleku lare Strukturaufklarung PhD thesis University of Bayreuth 2004 Logos Verlag axriu 354 pp 2004 ISBN 3 8325 0673 X J BRAUN Topologische Indizes und thre computerunterst tzte Anwendung in der Chemie Diploma thesis University of Bayreuth 1999 Most of these papers may be downloade
23. ERTY PREDICTION 31 e Invert selection e Learning set Test set partition 2 8 Property Prediction Let us now apply our best QSPR to predict the boiling points of all those decanes not included in our real library 2 8 1 Generating a Virtual Library Therefore we generate all decanes i e structural formulas to the molecular formula CioH22 1 Create a new Molgen document using File New Molgen 2 Use Edit Add Formula to call the Add Molecular Formula sheet 3 Enter Ci1oH2 in the Formula field 4 Click OK to add the molecular formula to the Molgen document 5 Use File Save As to save the Molgen document with name Decanes mgp 6 Start structure generation using Start in the Generator field 7 After a moment the computation will be completed resulting in 75 constitutional isomers 8 Select File Open Output to display the generated structures Note Often virtual libraries cannot be described as isomers of a molecular formula Rather particularly in combinatorial chemistry virtual libraries are specified by reac tants and reactions Such libraries can be generated using the reaction based structure generator MOLGEN COMB 2 8 2 Comparing Real and Virtual Library Now having generated all decanes we want to identify those not included in our real library of 50 decanes with known boiling points Starting from the Molecule document Decanes mb4 click File Compare to get to the Compare Molecule Files dialogue Figu
24. Graphs Pure Appl Chem 1983 55 199 206 21 BALABAN A T FILIP P Computer Program For Topological Index J MATCH Commun Math Comp Chem 1984 16 163 190 22 SCHULTZ H P Topological Organic Chemistry 1 Graph Theory and Topological Indices of Alkanes J Chem Inf Comput Sci 1989 29 227 228 23 SCHULTZ H P SCHULTZ T P Topological Organic Chemistry 6 Graph Theory and Molecular Topological Indices of Cycloalkanes J Chem Inf Comput Sci 1993 33 240 244 24 MULLER W R SZYMANSKI K KNOP J v TRINAJSTI N Molecular Topological Indices J Chem Inf Comput Sci 1990 30 160 163 25 MIHALI Z NIKOLI S TRINAJSTI N Comparative Study of Molecular De scriptors Derived from the Distance Matrix J Chem Inf Comput Sci 1992 32 28 37 3 8 REFERENCES 69 26 IVANCIUC O BALABAN T S BALABAN A T Design of Topological Indices Part 4 Reciprocal Distance Matrix Related Local Vertex Invariants and Topological Indices J Math Chem 1993 12 809 318 27 PLavsi D NIKOLI S TRINAJSTI N MIHALI Z On the Harary Index for the Characterization of Chemical Graphs J Math Chem 1993 12 235 250 28 Luci B MILICEVIC A NIKOLI S TRINAJSTI N Harary Index Twelve Years Later Croat Chem Acta 2002 75 847 867 29 RUCKER G RUCKER C Counts of All Walks as Atomic and Molecular Descrip tors J Chem Inf Comput Sci 1993 38 683
25. MOLGEN QSPR User Guide Software for Computation and Application of Quantitative Structure Property Relationships J Braun A Kerber R Laue M Meringer C R cker Bayreuth Munchen Freiburg June 10 2009 Contents Introduction 1 1 First steps 3 1 1 System Requirements a a BG ce ee 3 LET Hardware E Ge OS ee Dats ee ONE RE aS 3 LTD SOWIE rd share NA NUE RS BN SA ae o di a a 3 12 Installation s 2 2 ass red Ne Sales at ds eee me 3 ES ACM ON RS te Aa hog te ee RE Ha 4 LA Demo er Elite a nl Te ne aa Ae Ve a as a ER 4 2 Tutorial 5 221 Data Inputs es Les webs eae A ee ok E ee ak BS 5 2 1 1 Importing Structural Formulas 5 2 1 2 Importing Property Values 0 2 4 4 05 le dace aer na 7 2 1 3 Linking Structures and Property Values 8 2 1 4 Alternatives tor Data Input aa sr os a DOS 9 2 2 Displaying and Editing Data fas dey de a ey de a eg a E 9 2 2 1 Displaying Structural Formulas 10 2 2 2 Editing Property Values 2222 ae AH sain BE ok eS 10 2 2 3 Further Edit Operations ess 4 44 sa an sa tan Lu 11 2 3 Descriptor Calculation 274 5 ca aaa de re ar a 12 2 3 1 Calculating Indices 24 asa a dei real A 12 2 3 2 Calculating Substructure Counts 2 2 lt Zur Date L a 13 2 3 3 Calculating Fragment Counts 64h 2a Sy ner De 14 2 3 4 Descriptor Transformation x 2 244 2 gt ame hee Ee 17 2 4 Correlation Analysis aia 345 a ee ete a leo teehee 17 2 4
26. RSquared 0 95814 Standard error 1 60149 Fisher s F 201 41619 Residual sum of squares 112 85005 Mean squared residual 2 25700 Mean absolute residual 1 14024 Maximal absolute residual 4 11374 Figure 2 27 QSPR Details page 2 6 3 QSPR Descriptors Names and types of descriptors as well as preprocessing transformations can be seen on the Descriptors page Figure 2 28 2 6 4 QSPR Property The property investigated by the QSPR is noted on the Property page Figure 2 29 2 6 5 QSPR Model The specification of the prediction function is provided on the Model page Figure 2 30 2 6 6 QSPR Predictions The Prediction page offers a table of residuals experimental and calculated values Figure 231 Note Use the left mouse button and Copy in order to copy the complete table to the clipboard 2 6 7 QSPR Plot The Plot page shows a plot of experimental vs calculated values Figure 2 32 2 6 DISPLAYING AND SAVING QSPRS Properties topological or other index Figure 2 28 QSPR Descriptors page Properties Figure 2 29 QSPR Property page 27 28 CHAPTER 2 TUTORIAL Properties Figure 2 30 QSPR Model page Properties 174 92273 161 64126 163 73486 165 89539 165 37354 165 43267 167 63852 166 00273 159 95940 157 19700 160 45492 161 96235 159 58489 158 75190 Figure 2 31 QSPR Prediction page 2 7 VALIDATION 29 Properties x Figure 2 32 QSPR P
27. Se AI of aaSe AI of dssSe Al ddssSe AI of ddssSe Al sBr AI of sBr Al sI Al of sI AI sLi Al of shi Al ssBe Al ssssBe AI of ssBe Al of ssssBe Al ssBH AI of ssBH AI sPbH3 Al ssPbH2 Al of sPbH3 AI of ssPbH2 Al sssPbH Al ssssPb Al of sssPbH AI of ssssPb Xu Xu Xu index modified Xu index 3 4 GEOMETRICAL INDICES 41 3 4 Geometrical Indices Gi Gy incl H Ga Go incl H Ta Ig Ic st energy SHDW1 3 SHDW4 6 SHDW1 SHDW2 ssSHDW1 3 ssSHDW4 6 ssSHDW1 SHDW2 Vodw Pvdw Vodi Veub Sudw SASAn o SASAp Dan V phere gravitational index pairs 3D dist gravitational index bonds 3D dist principal moments of inertia A B C steric energy XY shadow XZ shadow YZ shadow standardized XY XZ YZ shadow XY XZ XY YZ XZ YZ shadow size sorted shadows 1 2 3 size sorted standardized shadows 1 2 3 size sorted shadows 1 2 1 3 2 3 Van der Waals volume density by Van der Waals volume standardized Van der Waals volume enclosing cuboid Van der Waals surface solvent accessible surface area H20 solvent accessible surface area H geometrical diameter enclosing sphere 3 5 Miscellaneous Indices slog P sMR at C01 at C27 at H01 at H04 at O01 at 012 at NO1 at N14 at Hal at Cl at Br at I at F at P at S01 at S02 at S03 at Me01 at Me02 Crippen slog P Crippen sMR Crippen atom types C01 C27
28. Zagreb subgraph order 3 6 chain overall second Zagreb subgraph chain overall Wiener order 3 chain overall Wiener chain 3 7 Definitions of Descriptors Leading references for the descriptors available in MOLGEN QSPR TODESCHINI R CONSONNI V Handbook of Molecular Descriptors Wiley VCH Weinheim and New York 2000 2nd ed 2009 under the new title Molecular Descriptors for Chemoinformatics TRINAJSTI N Chemical Graph Theory 2nd edition CRC Press Boca Raton FL 1992 3 7 1 Definitions of Arithmetic Descriptors 1 Numbers of atoms A denotes the number of atoms excluding H atoms A incl H means the number of atoms including H atoms Ny is the number of H atoms Cor respondingly we use the notations No No Ny Ns Nr Noi Ngr Nr and Np 2 Relative numbers of atoms The descriptors rel Ny rel No rel No rel Ny rel Ns rel Nr rel Nc rel Ng rel Nz rel Np mean the number of the respective atoms in the index divided by the total number of atoms including H atoms For example Nu rel Nu A incl H 3 Numbers of bonds B denotes the number of bonds in the H suppressed molecule while B incl H is the number of bonds in a molecule containing H atoms 3 7 10 11 12 13 14 DEFINITIONS OF DESCRIPTORS 45 Numbers of localized bonding electron pairs loc B is the number of localized bonding electron pairs in an H suppressed molecule Aromatic m e
29. ates of dO sum of E states of ssO sum of E states of aaO sum of E states of sF sum of E states of sPH2 sum of E states of ssPH sum of E states of sssP sum od E states of dsssP sum of E states of sssssP sum of E states of sSH sum of E states of dS sum of E states of ssS 3 3 ELECTROTOPOLOGICAL AND AI INDICES ssBe S ssssBe S ssBH S ssssB th T Te G2 DW o tn S oo sh a RS hh Qe Qe Sk Se A CRE De gt eg Sa O A S gt Q Q a aN Ny A wm 39 sum of E states of aaS sum of E states of dssS sum of E states of ddssS sum of E states of ssssssS sum of E states of sCl sum of E states of sSeH sum of E states of dSe sum of E states of ssSe sum of E states of aaSe sum of E states of dssSe sum of E states of ddssSe sum of E states of sBr sum of E states of sl sum of E states of sLi sum of E states of ssBe sum of E states of ssssBe sum of E states of ssBH sum of E states of ssssB sum of E states of sSiH3 sum of E states of ssSiH2 sum of E states of sssSiH sum of E states of ssssSi sum of E states of sGeH3 sum of E states of ssGeH2 sum of E states of sssGeH sum of E states of ssssGe sum of E states of sAsH2 sum of E states of ssAsH sum of E states of sssAs sum of E states of sssdAs sum of E states of sssssAs sum of E states of sSnH3 sum of E states of ssSnH2 sum of E s
30. cified 8 3 7 DEFINITIONS OF DESCRIPTORS 61 2 AI of atomic subgraphs These are quantities similar to the electrotopological indices For example m smod 2 AI sC H3 m DEDE t A mod 2 2 Op 0 y where m is the number of C H3 subgraphs and g the distance degree of atom i gmod is the modified degree of atom i 1 1 6med E ki where ki DL Aa h is the number of H atoms attached to atom 1 Z the number of valence electrons of atom 7 and Z its atomic number Remember that the term Z hi A called valence degree of atom i was introduced above in connection with Kier and Hall or valence connectivity Here is the list of all AI descriptors available in MOLGEN QSPR AI sCH3 Al sssNH Al aaS Al ssSiH2 AI dCH2 Al dsN AI dssS Al sssSiH AI ssCH2 Al aaN Al ddssS AI ssssSi AI tCH Al sssN Al ssssssS Al sGeH3 Al dsCH Al ddsN AI sCl Al ssGeH 2 Al aaCH Al aasN AI sSeH Al sssGeH Al sssCH Al ssssN Al dSe Al ssssGe AI ddC AI sOH Al ssSe AI sAsH2 AI tsC AI dO Al aaSe Al ssAsH AI dssC AI ssO Al dssSe Al sssAs Al aasC AI aaO Al ddssSe Al sssdAs Al aaaC AI sF AI sBr Al sssss As Al ssssC AI sPH2 AlI sI AI sSnH3 AI sNH3 Al ssPH AI sLi AlI ssSnH2 AI sNH2 Al sssP Al ssBe Al sssSnH Al ssNH2 Al dsssP Al ssssBe AI ssssSn AI dNH Al sssssP Al ssBH AI sPbH3 Al ssNH AI sSH Al sssB AI ssPbH2 Al aaNH AI dS
31. ct a variable for the x and one for the y axis Again use the mouse to select and display certain subsets of structures Note You may plot any column in the table property descriptor residual predic tion vs any other column To return to the table display use View Scatterplot again 2 5 REGRESSION ANALYSIS 19 Y Molgen DecanesReal md4 10 x 5 xj 9 leu g SEAS Figure 2 18 Molecular Descriptors document displayed as scatterplot 2 5 Regression Analysis The most important feature of MOLGEN QSPR is the ability to calculate quantitative structure property relationships Use File Regression to get to the Regression dialogue Figure 2 19 Before we start the regression analysis several settings concerning variables prepro cessing and regression method have to be specified Therefore press the Setup button You receive the Regression Setup sheet 2 5 1 Variable Selection Click on the Variables tabulator field in order to define the dependent and independent variables Figure 2 20 The dependent variable is chosen with the Target Variable combo box Independent variables are selected with the check boxes in the Regressors field 2 5 2 Regression Preprocessing Go to the Preprocessing tabulator field in order to define scaling and or centering methods for the dependent independent variables Figure 2 21 20 CHAPTER 2 TUTORIAL Regression M New QSPR Document
32. d in the form of preprints free of charge from the MOLGEN homepage at http www molgen de 73
33. der 3 6 path overall valence connectivity path overall first Zagreb order 3 6 path overall first Zagreb path overall first Zagreb subgraph order 3 6 path overall first Zagreb subgraph path overall second Zagreb order 3 6 path overall second Zagreb path overall second Zagreb subgraph order 3 6 path overall second Zagreb subgraph path overall Wiener order 3 6 path overall Wiener path overall connectivity order 3 6 cluster overall connectivity cluster 3 6 OVERALL INDICES 43 OS er RC TC Oo bales ECO TC 3T Mi e TOME T M MG JM T M SEM lese DM T M3 TM 502 TOM T M2 z 2 Wes ste SW TW iT Cpe dupe ST Cpe T Cpe CA HS IC pe io A TC TC pc T M1 pe a PEIM e T M1 pe T M1 zn ST Mire T M1 e T Ma pe je T Mo pe T M2 pe T M2 5 Seay T M2 TM Ws eer T Woe TWoe POS x TC T Cen TOs Te 3T Mi eh ase overall connectivity subgraph order 3 6 cluster overall connectivity subgraph cluster overall valence connectivity order 3 6 cluster overall valence connectivity cluster overall first Zagreb order 3 6 cluster overall first Zagreb cluster overall first Zagreb subgraph order 3 6 cluster overall first Zagreb subgraph cluster overall second Zagreb order 3 6 cluster overall second Zagreb cluster overall second Zagreb subgraph order 3 6 cluster overall second Zagreb subgraph cluster overall Wiener
34. e 2 4 Link Structures dialogue e Click on a column head to sort rows by ascending descending values and to simul taneously make this particular column the current column e The Current Column combo box offers a way to change the current column without sorting rows e The current column is always marked by one of the symbols or e Use View Statistics to display some fundamental statistical values of the current column such as arithmetic mean or standard deviation Figure 2 3 2 1 3 Linking Structures and Property Values The property values are not yet linked to the structures from the Molecule document Therefore use File Link Structures Figure 2 4 Use the Molecules combo box to select the structures and Link by number By clicking OK the structures will be linked to the table with the property values It can be useful to save this document with File Save Figure 2 5 2 2 DISPLAYING AND EDITING DATA 9 Save As 2 x sven ate gt Ae File name Decanes eal md4 H i Save as type Molecular Descriptor Files md4 y Cancel Figure 2 5 File Save dialogue A Molecular Descriptors File extension md is created At this moment it contains molecular structures together with property values later it will also contain descriptor values and other data If the initially imported sdf file provided compound names these are now displayable using View Names 2 1 4 Alternatives for Data Input
35. e01 atMe02 are occur rence numbers of atom types In Crippen s scheme an atom is typified according to its nature and to that of its neighbors Thus the C atom in a methyl group bonded to aliphatic C is of atom type C01 the C atom in a methyl group bonded to N or O is of atom type C03 the C atom in a methyl group bonded to aromatic C is of atom type C08 etc slog P and sMR These are log P and molar refraction as calculated by Crippen s method Denote by N the number of atoms of Crippen type k and by a the hydrophobicity increment of an atom of type k then slog P gt ax Ng k 3 7 DEFINITIONS OF DESCRIPTORS 65 If by denotes the increment for the molar refractivity of an atom of type k then we obtain s MR the molar refractivity as calculated by Crippen s method sMR gt br Ne k 3 7 6 Definition of Overall indices 1 Numbers of subgraphs Let denote the number of subgraphs of m edges in the H suppressed molecular graph UK S S a subgraph of m edges m 0 1 2 Using these indices we obtain numbers of subgraphs with restricted number of edges For example 8 0 87 gt mig m 0 is the number of subgraphs of lt 8 edges 4849 2 Overall indices 6 These indices are denoted as TO TO TO T is the overall index sign For the molecule each connected subgraph S up to size m is constructed The letter O means one of these M the first Zagreb index or M t
36. etween atoms 7 and j A more logical definition includes closed detours from atom to itself rings of maximal length At l otherwise where li is the size of the largest ring containing atom 2 l 0 if atom i is not in a ring e The Szeged matrix SZ SZ the rows and columns of which correspond to the non H atoms The entry SZ is the number of atoms in the H suppressed molecule that are closer to than to 7 SZ a a atom with Dia lt Dia Definition of graph theoretical indices 1 Wiener index W is the half sum of the distance matrix entries of the H suppressed 1 ij 2 Ist and 2nd Zagreb index M is the sum over all vertices of squares of vertex molecule 3 degrees Ma is the sum over all edges of products of vertex degrees of atoms i and j forming an edge i 7 4 maT ad Me Y do i edge i j The vertex degree 6 of atom is the number of its neighbors in an H suppressed molecular graph 3 7 DEFINITIONS OF DESCRIPTORS 49 3 1st and 2nd modified Zagreb index These indices use the reciprocal vertex degrees of the atoms in an H suppressed molecule m and TM edge i j Here m stands for modified 4 Randi or connectivity indices They form the series of indices y of order m 0 1 2 3 defined by A p 1 I my y path p of length m i 1 where the product is taken over the atoms in path p and A p means the number of
37. eved Click the Start button to start the calculation After the calculation is finished you can decide to ignore unique and or nonvariant substructures by the check boxes in the Output field Activate the Show Substruc tures check box if you want to create a new Molecule document with the retrieved Press OK to add the substructure counts to the Molecular Descriptors document 2 3 3 Calculating Fragment Counts Fragment counts are a third type of molecular descriptors A fragment is defined by the user A fragment may contain hydrogen atoms so it is a part of the hydrogen containing molecular graph Thus in H3C CHF CH2 CH3 2 fluorobutane H C F H3C CHF etc are fragments they will be retrieved and counted only when defined and searched as such To calculate fragment counts do the following 1 Use File New Moled to edit the fragment of interest Figure 2 12 2 3 DESCRIPTOR CALCULATION 15 METETE F lolx E File Edit View Tools Window Help 18 xl Oe Bx 4 CERERE E E 20 ej o 20 30 TAG gt N 2 D 1 0 5 i 5 rs ama 2 E a 2 e E joo Y e o co i ome 2 x 5 X H ES 4 H Atoms 4Selected 1 Mass 30 069 Selected 1 008 NUM L Figure 2 12 Moled document 2 Name the fragment by means of Edit Properties The Fragment Property sheet Figure 2 13 appears Enter the desired name and press OK 3 Switch back to your Molecular Descripto
38. ey are in the whole graph Size of the topological symmetry group The topological symmetry group is the set of automorphisms of the H suppressed molecular graph An automorphism is the possibility to exchange vertices such that all neighborhood relations are con served that is after this operation the graph looks the same as before The order or size of this group is indicated as sym_top In a completely unsymmetric graph this number is 1 since there is always one automorphism the trivial exchange of every vertex against itself In the H suppressed graph of 2 methylbutane or of 2 methyl 2 butene the two methyl groups bound to the same C atom are exchange able so that there is one nontrivial automorphism and the size of the topological symmetry group is 2 The topological radius is R min max Ds 1 lt i lt A 1 lt j lt A The number of connectivity components con comp means the number of connected components of the molecular graph In most cases this index is equal to 1 If the compound is made of more than one component the index increases 3 7 3 Definitions of Electrotopological and AI indices I Sum of E state of atomic subgraphs Every non H atom 7 is attributed a number S electrotopological state or E state that is composed of two terms S h X Al j 60 CHAPTER 3 THE MOLECULAR DESCRIPTORS The first term is the intrinsic state I characteristic for an atom type plus its attached H atom and
39. h atom The other descriptors are obtained as follows MW incl H Vodw Vodw yx Vouw Lae Dy bs E a Pvdw vdw gt Yes where L L and L are the maximum dimensions of the molecular surface in X Y and Z direction by using vdw radii where X Y and Z are the principal axes of 64 CHAPTER 3 THE MOLECULAR DESCRIPTORS inertia of the molecule incl H atoms Vodw is called the Van der Waals volume Pudw is the density by Van der Waals volume V the standardized Van der Waals volume Va the enclosing cuboid vdw Van der Waals surface S 4 is the surface of the molecule by using vdw radii for each atom The solvent accessible surface area SASAm o is the solvent accessible surface of the molecule by using vdw radii and an H20 molecule r 1 5A as a probe while SASA y is the solvent accessible surface of the molecule by using vdw radii and an H atom r 1 2A as a probe The geometrical diameter D3p is the maximum distance of two points on the vdw surface of the molecule including H atoms D3p max b a for points a b in the vdw surface Enclosing sphere Vsphere is the volume of the enclosing sphere including vdw radii of the molecule including H atoms 4 Dag D3 Venen Gm SE Sm 3 7 5 Definitions of Miscellaneous Indices 1 Crippen atom type numbers at C01 atC 27 atH01 atH04 atO01 atO12 atN01 atN 14 atHal atCl atBr atI atF atP atS01 atS03 atM
40. he current column then click Edit Delete Column To delete several columns simultaneously check them on the Regression Setup Variables page see Section 2 5 1 click OK and then Edit Delete Columns A new column is added by Edit Add Column 12 CHAPTER 2 TUTORIAL 2 3 Descriptor Calculation For calculation of QSPRs we need values of molecular descriptors as input for statistical learning procedures MOLGEN QSPR offers three types of molecular descriptors Indices substructure counts and fragment counts 2 3 1 Calculating Indices Having the Molecular Descriptor document selected as active window 1 use File Indices to obtain the Molecular Descriptors dialogue Figure 2 9 Molecular Descriptors x Descriptors Molecules Mlw iener index Mist Zagreb index M2nd Zagreb index 1st modified Zagreb index Land modified Zagreb index Messages MRandic index of order 0 MRandic Index of order 1 M Enabled MRandic Index of order 2 solvation connectivity index of order 0 solvation connectivity index of order 1 solvation connectivity index of order 2 y on La a me A PIPE arithmetic topological electrotapal Al geometrical 14 gt Search in List Open Selection Save Selection Progress All C Selected Current Descriptor Current Molecule OK All Molecules Cancel Figure 2 9 Molecular Descriptors dialogue 2 Activate check boxes in the Descriptors field t
41. he second Zagreb index or W the Wiener index or C for connectivity stands for the sum over the vertex degrees of the atoms in the subgraph considered or C represents the sum over the valence vertex degrees of the atoms In formal terms we obtain the indices MOS YE OO TON e HONS Sof sizem Sof sizem TO D OP OS Y O S Sof sizem type q Sof sizem type q If subgraphs of all sizes are considered we obtain TO 5 OS TO gt 018 S S TO S 069 79 N OS S of type q Sof typeq 66 descriptor CHAPTER 3 THE MOLECULAR DESCRIPTORS range of parameter m unrestricted version 0 lt m lt 6 1 lt m lt 6 0 lt m lt 6 0O lt m lt 6 1 lt m lt 6 1 lt m lt 6 1 lt m lt 6 1 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 4 lt m lt 6 4 lt m lt 6 4 lt m lt 6 4 lt m lt 6 4 lt m lt 6 4 lt m lt 6 4 lt m lt 6 4 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 3 lt m lt 6 MOLGEN QSPR contains these indices for the following parameters TC TC TC TM TM 3 8 REFERENCES 67 The sums run over the subgraphs regarding m and q if specified and sum up the values of the indices specified e g W for Wiener index of the subgraphs In TC TM T M calculations the 6 values of the vert
42. ical Complexity of Molecules SAR QSAR Environ Res 1997 7 23 48 49 RUCKER G RUCKER C Automatic Enumeration of All Connected Subgraphs MATCH Commun Math Comp Chem 2000 41 145 149 50 SHARMA V GOSWAMI R MADAN A K Eccentric Conectivity Index A Novel Highly Discriminating Topological Descriptor for Structure Property and Structure Activity Studies J Chem Inf Comput Sci 1997 37 273 282 51 RUCKER G RUCKER C GUTMAN I On Kites Comets and Stars Sums of Eigenvector Coefficients in Molecular Graphs Z Naturforsch A 2002 57a 143 153 52 SCHULTZ H P SCHULTZ E B SCHULTZ T P Topological Organic Chem istry 2 Graph Theory Matrix Determinants and Eigenvalues and Topological Indices of Alkanes J Chem Inf Comput Sci 1990 30 27 29 3 8 REFERENCES 71 53 NEEDHAM D E WEI I C SEYBOLD P G Molecular Modeling of the Physical Properties of the Alkanes J Am Chem Soc 1988 110 4186 4194 54 VEBER D F JOHNSON S R CHENG H Y SMITH B R WARD K W KOPPLE K D Molecular Properties that Influence the Oral Bioavailability of Drug Candidates J Med Chem 2002 45 2615 2623 55 Jurs P C HASAN M N HANSEN P J ROHRBAUGH R H Prediction of Physicochemical Properties of Organic Compounds from Molecular Structure Pages 209 233 in Physical Property Prediction Jochum C Ed Springer Berlin 1988 56 ROHRBAUGH R H JURS P C Desc
43. ices of the subgraphs are used If no asterisk appears in the symbol of an index then these are taken as they are in the parent graph If an asterisk appears in the symbol of an index then 6 values are taken as they are in the respective isolated subgraph 8 71 3 8 References 1 TODESCHINI R CONSONNI V Handbook of Molecular Descriptors Wiley VCH Weinheim and New York 2000 2nd ed 2009 under the new title Molecular Descriptors for Chemoinformatics 2 TRINAJSTI N Chemical Graph Theory CRC Press Boca Raton FL 2nd ed 1992 3 WIENER H Structural Determination of Paraffin Boiling Points J Am Chem Soc 1947 69 17 20 4 GUTMAN I RUSGI B TRINAJSTI N WiLcox C F Graph Theory and Molecular Orbitals XII Acyclic Polyenes J Chem Phys 1975 62 3399 3405 5 NIKOLIG S Kova evi G MILI EVI A TRINAJSTI N The Zagreb Indices 30 Years After Croat Chem Acta 2003 76 113 124 6 RANDIC M On Characterization of Molecular Branching J Am Chem Soc 1975 97 6609 6615 7 Kier L B Murray W J RANDIC M HALL L H Molecular Connectivity V Connectivity Series Applied to Density J Pharm Sci 1976 65 1226 1230 8 KIER L B HALL L H The Nature of Structure Activity Relationships and their Relation to Molecular Connectivity Eur J Med Chem 1977 12 307 312 9 Kier L B HALL L H Molecular Connectivity in Structure Activity Analysis Research S
44. l H topol Wiener index 1st 2nd Zagreb index 1st 2nd modified Zagreb index Randic indices of orders 0 1 2 solvation connectivity indices of orders 0 1 2 3 solvation connectivity index for clusters Kier and Hall valence connectivity indices of orders 0 1 2 3 Kier shape indices 1 2 3 Kier molecular flexibility index non alpha modified Kier alpha modified shape indices 1 2 3 Kier molecular flexibility index Platt number Gordon Scantlebury index Balaban index unsaturated Balaban index Schultz molecular topological index MTT index Harary number total walk count molecular walk counts of length 2 8 unsaturated total walk count unsaturated molecular walk counts of length 2 8 gravitational index pairs topol dist gravitational index pairs topol dist incl H atoms gravitational index bonds topol dist gravitational index bonds topol dist incl H atoms 3 2 TOPOLOGICAL INDICES 37 3 TOO TIC TIC TICs CIC CIC CIC N xCICo N CIC SIC SIC SIC N SICo N SIC BIC BIC BIC N BIC N BIC MSD W Wdiag Laie 2 8 oy a Pons gt 9 F e P 2 P 29p rings 3 8P rings Srings rings ch Gi ch Gs ch J ch Jg ch J k D ge dj SCA1 SC A2 SCA3 AP Xr T3 FRB SZD SZDp None Xo Hosoya Z index Basak information content of order 0 1 2 Basak total information content of order 0 1 2 Basak c
45. lectrons are delo calized and therefore not counted here loc B incl H is analogous but it includes bonds to H atoms Numbers of single bonds n is the number of single bonds in an H suppressed molecule n incl H analogously includes bonds to H atoms Relative numbers of single bonds rel n and rel n incl H indicate the relative numbers of bonds of an H suppressed molecule n incl H ve rel n and rel n incl H B incl H Numbers and relative numbers of multiple bonds n is the number of double bonds n the number of triple bonds and Maroma indicates the number of aromatic bonds Correspondingly we use the notations rel n rel n incl H rel n rel n ind H rel Maroma rel Maroma incl H for the relative numbers of multiple bonds relative to B or to B incl H The cyclomatic number C is defined as C B A 1 The molecular weight MW and MW incl H are the sums of the atomic weights in an H suppressed molecule and in the molecule including the H atoms respectively The atomic weight is that of the natural abundance isotope mixture The mean atomic weight or average atomic weight The mean atomic weights are defined as MW MW incl H mean AW and mean AW incl H A incl M The total charge cha is the charge of the molecule The number of radical centers Nrad The number of hydrogen bond donors HBD is assumed to be the
46. logarithm or a sum or product etc of two descriptors already present use Edit Transform Column see Figure 2 16 A transformation chosen here works on the current column 2 4 Correlation Analysis In order to select descriptors for a QSPR study it might be useful to initially analyse property descriptor and descriptor descriptor correlations 18 CHAPTER 2 TUTORIAL gt oooorooroo0oocoor Figure 2 17 Correlation Matrix dialogue 2 4 1 Calculating the Correlation Matrix To obtain the correlation matrix of all variables properties descriptors residuals pre dictions choose View Correlations A window will appear showing the matrix of absolute correlation coefficients Figure 2 17 Often a Molecular Descriptor document will contain many columns say several hun dred In such cases it is advisable to calculate the correlation matrix for a small subtable only Editing the table is described in Section 2 2 3 In order not to lose data edit a copy of your table rather than the table itself Missing values N A will prohibit the correlation matrix calculation so make sure to exclude a column or row containing missing values see Section 2 2 3 For a visualisation of intercorrelations use the scatterplot feature 2 4 2 Displaying Correlations Using View Scatterplot you can change the Molecular Descriptors document to be dis played as scatterplot Figure 2 18 Using the upper left combo boxes sele
47. lot page 2 7 Validation 2 7 1 LOO Crossvalidation As a first validation step for our best QSPR equation let us perform a leave one out crossvalidation Open a md4 and the corresponding qspr document containing at least one model switch to the md4 document and click Crossvalidation in the View menu A page similar to the QSPR Details page will be displayed showing inter alia the values of R and Se see Figure 2 33 Missing values N A will prohibit the crossvalidation calculation so make sure to exclude rows columns containing missing values see Section PE As a necessary but not sufficient condition for a valid QSPR equation the crossval idation results R cv Sev plot should be only moderately worse than the original ones compare Figures 2 33 and 2 34 to Figures 2 27 and 2 32 respectively 2 7 2 Further Validation As a rule a particular QSPR model needs further validation before it can be considered reliable Since various validation methods are in use or recommended by various authors no corresponding procedures are installed as black boxes in MOLGEN QSPR There are however a number of features that may be helpful in validation such as e Random column e Random selection CHAPTER 2 TUTORIAL ICrossvalidation JUbemenmer Figure 2 33 Leave one out Crossvalidation Details page Crossvalidation Jbermenmen Figure 2 34 Leave one out Crossvalidation Plot page 2 8 PROP
48. m Type AI Topological Indices to QSPR Studies of Alkanes Comput amp Chem 2002 26 357 369 67 REN B A New Topological Index for QSPR of Alkanes J Chem Inf Comput Sci 1999 89 189 148 68 BONCHEV D TRINAJSTI N Overall Molecular Descriptors 3 Overall Zagreb Indices SAR QSAR Environ Res 2001 12 213 236 69 BONCHEV D The Overall Wiener Index A New Tool for Characterization of Molecular Topology J Chem Inf Comput Sct 2001 41 582 592 70 BONCHEV D Overall Connectivity A Next Generation Molecular Connectivity J Mol Graphics Model 2001 20 65 75 71 BONCHEV D Overall Connectivities Topological Complexities A New Powerful Tool for QSPR QSAR J Chem Inf Comput Sci 2000 40 984 941 72 RUCKER C MERINGER M How Many Organic Compunds are gt nonplanar MATCH Commun Math Comput Chem 2002 45 159 172 73 BUCKLEY F HARARY F Distance in Graphs Addison Wesley Redwood City CA 1990 page 213 74 ANONYMUS Searching Properties in the CAS Registry File STNotes 2002 28 1 7 75 BRAUN J GUGISCH R KERBER A LAUE R MERINGER M RUCKER C MOLGEN CID A Canonizer for Molecules and Graphs Accessible through the Internet J Chem Inf Comput Sct 2004 44 542 548 76 AUGUSTIN V Computerunterstiitzte Berechnung von Symmetrien unscharfer Strukturen Diploma thesis University of Bayreuth 2004 Chapter 4 Literature on MOLGEN QSPR C RU
49. ns column heads the following lines contain data for compounds one line for each compound The first column contains the compound name the following column s contain s property values Columns are sepa rated by tabulators Such a file is already prepared with boiling points of the structures above Use the following steps to import the property file 1 Click on File Import to open the Import File dialogue 2 Select Ascii Table tabulator separated txt in the Filetype combo box 3 Click on DecanesReal tat in order to select the desired file 4 Use the Open button to open the selected file The boiling points of the real library will now be displayed on the screen Figure 2 2 The status bar shows that there are 50 rows and one column in this file the structure names are not counted as column Again there are various functions available to change the layout of the table and to retrieve additional information about the data for instance 8 CHAPTER 2 TUTORIAL M Descriptor Statistics bp Ioj x Arithmetic Mean 157 852 Standard Deviation 7 41728 Distinct Values 43 Value Distribution 136 00000 145 00000 146 00000 147 00000 147 60000 147 70000 148 50000 148 70000 149 70000 151 50000 il al 1 al al 1 al al al al Figure 2 3 Descriptor Statistics dialogue Link Structures x Molecules DecanesReal Link by name Link by number F Keep names Cancel Figur
50. number of H atoms attached to O and N atoms in accord with the Chemical Abstracts ACD definition The number of hydrogen bond acceptors HBA is assumed to be the number of N and O atoms in accord with the Chemical Abstracts ACD definition 46 CHAPTER 3 THE MOLECULAR DESCRIPTORS 15 The number of charged atoms is indicated as Neha 16 Monoisotopic mass exact and integer These are the sums of the exact or integer masses of the most abundant isotope for all atoms incl H denoted by mass_exact and mass_int respectively 3 7 2 Definitions of Topological Indices Definitions of graph theoretical matrices The graph theoretical indices are based on the following important graph theoretical notions e The adjacency matrix A A of the molecular graph is defined to be 1 if there is a covalent bond between atoms 7 and j and 0 otherwise or in terms of the corresponding molecular graph 1 if edge i 7 exists ij 0 otherwise The degree of vertex i or atom 2 0 is the i th row sum j e The unsaturated adjacency matrix A 4 is defined by if there is a single bond between atoms and J if there is a double bond between atoms i and 7 1 2 Ay 43 if there is a triple bond between atoms i and 7 1 5 if there is an aromatic bond between atoms and 7 0 otherwise e The distance matrix D D where D means the distance shortest path length between atoms i and j in the
51. number of Br atoms number of atoms relative number of atoms number of P atoms relative number of P atoms number of bonds number of bonds incl H atoms number of localized bonding electron pairs number of localized bonding electron pairs incl H atoms number of single bonds relative number of single bonds number of single bonds incl H atoms relative number of single bonds incl H atoms number of double bonds relative number of double bonds relative number of double bonds incl H atoms number of triple bonds relative number of triple bonds relative number of triple bonds incl H atoms number of aromatic bonds relative number of aromatic bonds relative number of aromatic bonds incl H atoms cyclomatic number 39 36 MW mean AW MW incl H mean AW incl H cha rad HBD HBA N_charged mass_exact mass_int CHAPTER 3 THE MOLECULAR DESCRIPTORS molecular weight mean atomic weight molecular weight incl H atoms mean atomic weight incl H atoms total charge number of radical centers number of hydrogen bond donors number of hydrogen bond acceptors number of charged atoms Monoisotopic mass exact and integer 3 2 Topological Indices W M Ma Mi M DEN Oye Tys 2y e One tyt 2x s 3 8 X V Baa X Nes J Junsat MTI MTI H twe mwe mwe tWCunsat 2 8 unsati G topol Gi Ga Ga 2 2 topol incl H WC topol inc
52. number of path subgraphs of length 2 in an H suppressed molecular graph Balaban index saturated and unsaturated The saturated index is B 1 I C 1 gt Irenos Oi Os edge i j where B is the number of bonds while means the i th atom distance degree ie o j D j C is the cyclomatic number The sum runs over all edges of an H suppressed molecular graph The unsaturated index is B 1 Junsat Cal gt ere edge i j ee where 6 is the unsaturated distance degree i e the i th row sum in the unsaturated distance matrix 52 14 15 16 CHAPTER 3 THE MOLECULAR DESCRIPTORS Schultz molecular topological index MTI We introduce MTT as the following scalar product of vectors MTI di SiH 0 x on Lo and define the Schultz molecular index as MTI 067 MIT i 1 Quantities 6 and g are degree and distance degree respectively of atom 2 in the H suppressed molecule 2 Harary number This is defined as A A 1 Ma Da Dig 1 j i 1 again for an H suppressed molecular graph 8 Walk counts We start with the molecular walk count of length k defined by much A s tj where A A means the adjacency matrix of the H suppressed molecular graph AF A its k th power Remark mwc is equal to the number of atoms mwec is equal to 2B mwc M mwc 2M Using this notion we introduce the total walk count n 1 twc mwc k 1 The sum runs
53. o select descriptors to be calculated Click the tabulator fields to switch between various categories of indices arithmetic indices topological indices electrotopological indices geometrical indices miscellaneous indices and overall indices 2 3 DESCRIPTOR CALCULATION 13 10 x Drie Edt View Window Help 2181 x CC ELTETE E Fee Molecule z qee a x a amas RRR pet 44 0000 8 1987 4 5378 4 1157 8 1987 4 5378 4 115 le 39 0000 7 9831 4 7399 3 4316 7 9831 4 7393 3 43 39 0000 7 9831 4 7187 3 5814 7 9831 4 7187 3 581 38 0000 7 9831 4 7019 3 6430 7 9831 4 7019 3 643 42 0000 8 0355 4 6820 3 6642 8 0355 4 6820 3 664 51 0000 8 5774 4 1547 5 4537 8 5774 4 1547 5 453 37 0000 7 9831 4 6639 3 8382 7 9831 4 6533 3 338 48 0000 3 3618 4 4147 4 3748 8 3618 4 4147 4 374 50 0000 8 4142 4 3107 4 8339 8 4142 4 3107 4 383 38 0000 7 9831 4 7018 3 6042 7 9831 4 7015 3 604 36 0000 7 9831 4 6259 40722 7 9831 46259 D gt Items 50 Selected 0 Columms 31 NUM Figure 2 10 Molecular Descriptors document with descriptor values On the right there are radio buttons that determine whether descriptors should be calculated for all molecules in the Molecular Descriptors document or for selected molecules only Using the Messages check box error messages can be disabled There are further buttons for searching indices by their name saving descriptor selections and opening previously saved selections 3 Click on the
54. omplementary information content of order 0 1 2 total complementary information content of order 0 1 2 Basak structural information content of order 0 1 2 total structural information content of order 0 1 2 bonding information content of order 0 1 2 total bonding information content of order 0 1 2 mean square distance index detour index detour index incl half main diagonal total acyclic path count molecular acyclic path count of length 2 8 molecular acyclic path count of length 9 and higher total path count molecular path count of length 2 8 molecular path count of length 9 and higher total ring count molecular ring count of length 3 8 molecular ring count of length 9 and higher topological charge index of order 1 8 mean topological charge index of order 1 8 global topological charge index of order k topological diameter eccentric connectivity index principal eigenvalue of A sum of coefficents of principal eigenvector of A mean coefficent of principal eigenvector of A log of sum of coefficients of principal eigenvector of A principal eigenvalue of D total x index number of methyl groups number of pairs of methyl groups at distance 3 freely rotatable bonds Szeged index hyper Szeged index connectivity index x x path 38 O Mia se Yehi cs Da runs i eran K pen n s ee Be BER 2 sym_top R con comp CHAPTER 3 THE MOLECULAR DESCRIPTORS connectivity index x x cluster c
55. on content of order r total structural information content of order r bonding information content of order r total bonding information content of order r 21 Mean square distance index This index is defined as D Di i 5 gt MSD caren where the sum is taken over all atoms in the H suppressed molecular graph 3 7 DEFINITIONS OF DESCRIPTORS 99 2 23 Detour indices If A A denotes the detour matrix of an H suppressed 1 1 3 molecular graph is the detour index A variant is Wdiag 5 A i j where A A means the detour matrix including main diagonal elements 38 42 73 Path counts 444 With P being the number of paths of length I in the H suppressed molecular graph without counting any closed paths rings and lmaz being the maximum length of all unclosed paths the total molecular acyclic path count is defined as lmaz l Pace o J Peje l 1 In MOLGEN QSPR acyclic path counts are implemented up to Ps Longer paths if any are collectively counted in lmaz gt 9 l Fe P aiye I 9 Considering also closed paths we get P the number of paths of length l in the H suppressed molecular graph and the total molecular path count Path counts are implemented in MOLGEN QSPR up to P Again paths longer than 8 if any are collectively counted as lmaz 29p ip I 9 24 Ring counts Restricting attention to rings we obtain the total ring coun
56. onnectivity index x x path cluster connectivity index x x chain valence connectivity index x x path valence connectivity index y cluster valence connectivity index y path cluster valence connectivity index y x chain size of topological symmetry group topological radius number of connectivity components 3 3 Electrotopological and AI Indices S sC H3 S dCH2 S ssC H2 S tCH S dsCH S aaCH S sssCH S ddC S tsC S dssC S aasC S aaaC S ssssC S sNH3 S sNH2 S ssN H2 S dNH S ssNH S aaNH S tN S sssNH S dsN S aaN S sssN S ddsN S aasN S ssssN S sOH S dO S ssO S aaO S sF S sPH2 S ssPH S sssP S dsssP S sssssP S sSH S dS S ssS sum of E states of sCH3 sum of E states of dCH2 sum of E states of ssCH2 sum of E states of tCH sum of E states of dsCH sum of E states of aaCH sum of E states of sssCH sum of E states of ddC sum of E states of tsC sum of E states of dssC sum of E states of aasC sum of E states of aaaC sum of E states of ssssC sum of E states of sNH3 sum of E states of sNH2 sum of E states of ssNH2 sum of E states of dNH sum of E states of ssNH sum of E states of aaNH sum of E states of tN sum of E states of sssNH sum of E states of dsN sum of E states of aaN sum of E states of sssN sum of E states of ddsN sum of E states of aasN sum of E states of ssssN sum of E states of sOH sum of E st
57. order 3 6 cluster overall Wiener cluster overall connectivity order 4 6 path cluster overall connectivity path cluster overall connectivity subgraph order 4 6 path cluster overall connectivity subgraph path cluster overall valence connectivity order 4 6 path cluster overall valence connectivity path cluster overall first Zagreb order 4 6 path cluster overall first Zagreb path cluster overall first Zagreb subgraph order 4 6 path cluster overall first Zagreb subgraph path cluster overall second Zagreb order 4 6 path cluster overall second Zagreb path cluster overall second Zagreb subgraph order 4 6 path cluster overall second Zagreb subgraph path cluster overall Wiener order 4 6 path cluster overall Wiener path cluster overall connectivity order 3 6 chain overall connectivity chain overall connectivity subgraph order 3 6 chain overall connectivity subgraph chain overall valence connectivity order 3 6 chain overall valence connectivity chain T MpDen overall first Zagreb order 3 6 chain 44 T MyDen TM T M en T MDen T Mo en TM TM se T Wen gt 6T M lt ST EWE 3 ST M T Wen CHAPTER 3 THE MOLECULAR DESCRIPTORS overall first Zagreb chain overall first Zagreb subgraph order 3 6 chain overall first Zagreb subgraph chain overall second Zagreb order 3 6 chain overall second Zagreb chain overall second
58. over all lengths k from 1 to n 1 of walks in an H suppressed molecular graph where n is the number of non H atoms Note This is the original definition of twe 3 7 DEFINITIONS OF DESCRIPTORS 53 17 Unsaturated molecular walk counts These are defined in terms of powers of 18 19 20 the unsaturated adjacency Matrix A mweh a N 4 E UN This expression is called the unsaturated molecular walk count of length k while the unsaturated total walk count is the sum over these n 1 i 4 WCunsat MWCunsats k 1 where n is the number of non H atoms The sum runs over all lengths k from 1 to n 1 of walks in an H suppressed molecular graph Gravitational Indices topo dist These are the indices A incl H A incl H A A 1 topol gt 25 and Gi topol incl H gt gt iat i 1 j i 1 Di j i 1 Di where w is the atomic weight of atom i expressed in amu i e 12 0110 for carbon and the sum runs in the first case over all pairs of atoms in an H suppressed molecular graph while in the second case the hydrogen atoms are included If we restrict attention to bonds pairs of distance 1 we obtain Ga topol gt w w and G topol incl H D Wi Wj edge i j edge i j where the latter includes bonds to H atoms Hosoya index Z 4 Denoting by az the number of sets of k mutually non adjacent edges in the H suppressed molecular graph so that for example aj 1
59. re 2 35 32 CHAPTER 2 TUTORIAL Compare Molecule Files x Input First file Decanes mb4 Second file DecaesReal h r Progress 1 ez Current step All steps r Output 7 Molecules in first file without dublettes Molecules in second file without dublettes 7 Molecules only in first file without dublettes I Molecules only in second file without dublettes I Molecules in both files without dublettes Cancel Figure 2 35 Compare Molecule Files dialogue Select DecanesReal in the Second File combo box and click Start to start the compar ison of the two Molecule documents The program will answer in the Output field Figure 2 36 As we are interested in structures occuring only in Decanes and not in DecanesReal r Output I Molecules in first file without dublettes 75 I Molecules in second file without dublettes 50 IV Molecules only in first file without dublettes 25 I Molecules only in second file without dublettes 0 I Molecules in both files without dublettes 50 Figure 2 36 Compare Molecule Files output we activate the corresponding check box After pressing OK a new Molecule document appears named Decanes DecanesReal and containing the 25 decanes not included in De canesReal 2 8 3 Applying QSPRs for Prediction In order to predict property values we have to switch back to the QSPR document Now select the QSPRs
60. ription of Molecular Shape Applied in Studies of Structure Activity and Structure Property Relationships Anal Chim Acta 1987 199 99 109 57 ROHRBAUGH R H JURS P C Molecular Shape and the Prediction of HPLC Retention Indexes of Polycyclic Aromatic Hydrocarbons Anal Chem 1987 59 1048 1054 58 KIER L B HALL L H Molecular Structure Description The Electrotopological State Academic Press San Diego CA and London 1999 59 KHADIKAR P V DESHPANDE N V KALE P P DOBRYNIN A GUTMAN I DOMOTOR G The Szeged Index and an Analogy with the Wiener Index J Chem Inf Comput Sct 1995 35 547 550 60 GUTMAN I KLAVZAR S An Algorithm for the Calculation of the Szeged Index of Benzenoid Hydrocarbons J Chem Inf Comput Sci 1995 35 1011 1014 61 ZEROVNIK J Computing the Szeged Index Croat Chem Acta 1996 69 837 843 62 ZEROVNIK J Szeged Index of Symmetric Graphs J Chem Inf Comput Sci 1999 89 77 80 63 REN B Novel Atomic Level Based AI Topological Descriptors Application to QSPR QSAR Modeling J Chem Inf Comput Sci 2002 42 858 868 64 REN B Atomic Level Based AI Topological Descriptors for Structure Property Correlations J Chem Inf Comput Sci 2003 43 161 169 65 REN B Novel Atom Type AI Indices for QSPR Studies of Alcohols Comput amp Chem 2002 26 223 235 72 CHAPTER 3 THE MOLECULAR DESCRIPTORS 66 REN B Application of Novel Ato
61. rs document using the Window submenu or clicking on the Molecular Descriptors document s window 4 Call the Fragment Counts dialogue Figure 2 14 by File Fragment Counts 5 Add fragments using the Add button In the following dialogue Figure 2 15 you can select fragments from opened Moled documents 6 Once you have selected one or more fragments start the calculation using the Start button 7 After the calculation is finished you can decide to ignore unique and or nonvariant fragments by the check boxes in the Output field 8 Press OK to add the fragment counts to the Molecular Descriptors document Our example fragment Methyl counts CH3 groups whereas the substructure count for C is the occurrence number of C atoms i e the sum of occurrences of CH3 CH2 CH groups and C atoms without H 16 CHAPTER 2 TUTORIAL Fragment Properties Figure 2 13 Fragment Properties Common page Fragment Counts Figure 2 14 Fragment Counts dialogue 2 4 CORRELATION ANALYSIS 17 Add Fragment x Fragment Select Fragment Methy l Browse i Cancel Figure 2 15 Add Fragment dialogue Numerical Transformation xi Igls 10 Select y i a a sart s R32 y 1x xy abs x xy Figure 2 16 Transform column dialogue 2 3 4 Descriptor Transformation If you need a somewhat more complex variant of a descriptor already present such as the reciprocal square square root
62. ructure generation structure canonization and removal of duplicates numerous descriptors of various types descriptor transformation its ability to plot each variable including residuals and predictions vs each other variable its variety of statistical learning methods and its ability to provide predictions for complete sets of compounds render MOLGEN QSPR unique among similar programs Chapter 1 First steps 1 1 System Requirements MOLGEN QSPR is available for MS Windows 95 98 NT4 0 Me 2000 XP Vista 1 1 1 Hardware In order to use MOLGEN QSPR the following hardware requirements have to be fulfilled e IBM compatible PC 80486 or higher e CD ROM drive for installation e At least 10 MB RAM and the same amount of free disc space The space needed depends of course on the problem i e on the number of structural formulas to be processed 1 1 2 Software Some of the algorithms included in MOLGEN QSPR call routines provided by the software package for statistical computing R 2 8 1 or higher This software can be downloaded free of charge at http cran r project org In order to be able to access sophisti cated regression methods additionally the following R packages need to be installed tree regression trees e1071 support vector machines and pls partial least squares 1 2 Installation MOLGEN QSPR consists of one executable and does not require any DLLs or anything else Therefore you can start it already
63. sists of four principal steps e structure preprocessing e descriptor computation e regression analysis and validation e prediction of unknown property values All these steps can be performed with MOLGEN QSPR Structure preprocessing includes addition of H atoms which are typically suppressed in electronic representations of molecular graphs identification of aromatic bonds which are often coded as alternating single and double bonds and computation of a 3D layout using a force field model The latter is necessary if geometrical descriptors are to be applied Molecular descriptors are used in order to map molecular structures onto real num bers Currently MOLGEN QSPR provides about 700 built in descriptors of various types among them arithmetical topological and geometrical indices Furthermore substructure and fragment counts can be used as molecular descriptors Once the descriptor values are calculated methods of supervised statistical learning are applied in order to find prediction functions that fit the target variable well There are several methods available covering linear regression artificial neural networks support vector machines regression trees and nearest neighbors regression Finally if a good QSPR is found it can be applied for property prediction for all mem bers of a virtual combinatorial library Such libraries can be constructed using MOLGEN s structure generators MOLGEN QSPR s features such as st
64. t lmaz rings rings 1 3 56 25 26 CHAPTER 3 THE MOLECULAR DESCRIPTORS where rings is the number of rings of length ring size in the H suppressed molecular graph lmar the maximum ring size In MOLGEN QSPR ring counts rings rings are implemented rings of size gt 9 if any are collectively counted as lmaz rings rings 1 gt 9 Topological charge indices of order k These indices use the charge term matrix CT CT as well as the distance matrix They are defined in terms of the atoms 45 46 in the H suppressed molecule as follows 1 ch Gp zen k Dis k 1 2 where k D is the Kronecker delta i e 1 ifk Dij 0 otherwise These indices are called topological charge indices of order k k 1 8 in MOLGEN QSPR while the mean topological charge indices of order k are ch Gk ch Jg Aa k 12 and the global topological charge indices of order k are k eI I dd l 1 In MOLGEN QSPR mean topological charge indices are implemented up to ch Js as well as the global topological charge index ch J 5 The diameter is the maximal distance between two atoms in the H suppressed molecule D max D 1 lt i lt j lt A 3 7 27 28 29 30 31 32 33 34 39 DEFINITIONS OF DESCRIPTORS 97 The eccentric connectivity index This is A ES Sn 5 On i 1 where n is the maximum entry in the i th row of
65. tates of sssSnH sum of E states of ssssSn sum of E states of sPbH3 sum of E states of ssPbH2 sum of E states of sssPbH sum of E states of ssssPb AI of sCH3 AI of dCH2 Al of ssCH2 AI of tCH AI of dsCH AI of aaCH AI of sssCH AI of ddC Al of tsC AI of dssC AI of aasC Al of aaaC Al of ssssC AI of sNH3 AI of sNH2 AI of ssNH2 AI of dNH AI of ssNH AI of aaNH AI of sssNH AI of tN AI of dsN Al of aaN Al of sssN Al of ddsN AI of aasN sssB Al ssssB gt gt gt SP 2 I sAsH2 Al ssAsH I sssAs Al sssdAs D AI of sssB AI of ssssB I sSiH3 Al ssSiH2 Al of sSiH3 Al of ssSiH2 I sssSiH Al ssssSi Al of sssSiH AI of ssssSi I sGeH3 Al ssGeH2 Al of sGeH3 Al of ssGeH2 I sssGeH Al ssssGe Al of sssGeH AI of ssssGe AI of sAsH2 AI of ssAsH AI of sssAs AI of sssdAs Al sssss As Al of sssss As AI sSnH3 Al ssSnH2 Al of sSnH3 Al of ssSnH2 Al sssSnH Al of sssSnH Al ssssSn AI of ssssSn 40 CHAPTER 3 THE MOLECULAR DESCRIPTORS Al ssssN AI of ssssN AI sOH AI of sSOH AI dO Al ssO AI of dO AI of ssO Al aaO AI of aaO Al sF Al of sF AI sPH2 Al ssPH AIl of sPH2 Al of ssPH Al sssP Al dsssP AI of sssP AI of dsssP Al sssssP Al of sssssP Al sSH AT of sSH AI dS Al ssS Al of dS Al of ssS Al aaS Al dssS Al of aaS Al of dssS Al ddssS Al ssssssS Al of ddssS Al of ssssssS AI sCl Al of sCl Al sSeH AI of sSeH Al dSe Al ssSe AI of dSe Al of ssSe Al aaSe Al dss
66. the distance matrix 6 the vertex degree of atom i The principal leading first eigenvalue of A Af is the principal eigenvalue of the adjacency matrix We note that A is a real symmetric matrix and therefore diagonalizable with real diagonal elements The sum of coefficients of the principal eigenvector of A Denoting by cf the i th coefficient of the eigenvector of the principal eigenvalue of A we obtain the descriptors SCAI n Al SOCAL gt le SCA2 SCAB log SCA The sum runs over all n atoms of an H suppressed molecule The principal leading first eigenvalue of D A denotes the principal eigen value of the distance matrix The total Chi index is defined as nd Hse The product runs over all atoms of an H suppressed molecular graph The number of methyl groups is denoted by T The number of pairs of methyl groups at distance 3 is T3 The number of freely rotatable bonds FRB means the number of bonds that are acyclic single not terminal in the H suppressed molecule and not an amide C N bond Szeged indices These are expressed in terms of the Szeged matrix defined above A SZD y SZ ij SZ and SZDp Y SZ SZ edge i j il 58 36 CHAPTER 3 THE MOLECULAR DESCRIPTORS The edges and pairs are those in an H suppressed molecular graph SZD is called the Szeged index while SZ Dp is the hyper Szeged index 2 Connectivity indices for subs
67. tructures These topological indices are expressed in terms of subgraphs of type q which means paths clusters path clusters or chains in the H suppressed molecular graph m is the order i e the number of edges of the subgraphs considered K m q is the number of subgraphs of type q and order m n is the number of atoms in the subgraph considered m q K m q VII 10 VII i A RTE Available in MOLGEN QSPR are the connectivity indices BM TS Oe PR A Im lt 6 dE MO Ts IMG and the valence connectivity indices Xp IM L 6 Xa 3 IM LG X ot Sm L 6 Xem I SML G where a subgraph is of type chain ch if it contains a cycle m gt 3 otherwise if every vertex has either one or more than two non H neighbors it is of type cluster c for m gt 3 otherwise if every vertex has one or two non H neighbors it is of type path p for m gt 3 otherwise it is of type path cluster pc for m gt 4 So a path cluster has no cycles but vertices with one two and more than two non H neighbors For example chains of order m 3 4 4 are A D Clusters of order m 3 4 5 are i RPS 3 7 DEFINITIONS OF DESCRIPTORS 59 37 38 39 Paths of order m 3 4 5 are EG AN NL A Path clusters of order m 4 6 6 are DT NU OX For classification of subgraphs the numbers of non H neighbors are taken as they are in the isolated subgraphs whereas in the calculation of x values the are taken as th
68. tudies Press Wiley Chichester UK 1986 10 ZEFIROV N S PALYULIN V A QSAR for Boiling Points of Small Sulfides Are the High Quality Structure Property Activity Regressions the Real High Quality QSAR Models J Chem Inf Comput Sci 2001 41 1022 1027 11 KIER L B HALL L H Derivation and Significance of Valence Molecular Con nectivity J Pharm Sci 1981 70 588 589 68 CHAPTER 3 THE MOLECULAR DESCRIPTORS 12 KIER L B Shape Indexes of Orders One and Three from Molecular Graphs Quant Struct Act Relat 1986 5 1 7 13 KIER L B Indexes of Molecular Shape from Chemical Graphs Acta Pharm Jugosl 1986 36 171 188 14 KIER L B A Shape Index from Molecular Graphs Quant Struct Act Relat 1985 4 109 116 15 KIER L B Distinguishing Atom Differences in a Molecular Graph Shape Index Quant Struct Act Relat 1986 5 7 12 16 KIER L B An Index of Molecular Flexibility from Kappa Shape Attributes Quant Struct Act Relat 1989 8 221 224 17 PLATT J R Influence of Neighbor Bonds on Additive Bond Properties in Paraffins J Chem Phys 1947 15 419 420 18 PLATT J R Prediction of Isomeric Differences in Paraffin Properties J Phys Chem 1952 56 328 336 19 BALABAN A T Highly Discriminating Distance Based Topological Index Chem Phys Lett 1982 89 399 404 20 BALABAN A T Topological Indices Based on Topological Distances in Molecular
69. you want to use for prediction On File Prediction the Prediction dialogue appears Figure 2 37 2 8 PROPERTY PREDICTION 33 Prediction s Sele Decanes DecanesReal v Figure 2 37 Prediction dialogue Select Decanes DecanesReal in the Molecules combo box and click the Start button After the computation is finished press OK and the 25 predicted property values will appear in a new Molecular Descriptors document see Figure 2 38 34 CHAPTER 2 TUTORIAL Molgen Predictions for Decanes DecanesReal SOURCES a ONDARRUN e Figure 2 38 Prediction Result page Chapter 3 The Molecular Descriptors 3 1 Arithmetic Indices A A incl H Ny rel Ny No rel No No rel No Ny rel Ny Ng rel Ng Nr rel Nr Na rel No NBr rel Ngr Ny rel Ny Np rel Np B B incl H loc B loc B incl H n rel n n incl H rel n incl H n rel n rel n incl H n rel n rel n incl H Naromas rel Naroma rel Maroma incl H C number of atoms number of atoms incl H atoms number of H atoms relative number of H atoms number of C atoms relative number of C atoms number of O atoms relative number of O atoms number of N atoms relative number of N atoms number of S atoms relative number of S atoms number of F atoms relative number of F atoms number of Cl atoms relative number of Cl atoms number of Br atoms relative

Download Pdf Manuals

image

Related Search

Related Contents

Investir dans une œuvre d`Art : mode d`emploi  Sony TA-E1 User's Manual  Samsung HT-AS710T/XAC User Manual    RuDriCo2 - Um Conversor Baseado em Regras de - INESC-ID  MF18デジタルマクロ  GALAXY S4用2ndバッテリ 充電スロット付きクレードル  Instalación de Faronics Core    Behringer X-UF Product Information  

Copyright © All rights reserved.
Failed to retrieve file