Home

Multi Layer Perceptron with Back Propagation User Manual

1. Mining the SDSS archive I Photometric Redshifts in the Nearby Universe Astrophysical Journal Vol 663 pp 752 764 The Fourth Paradigm Microsoft research Redmond Washington USA Artificial Intelligence A modern Approach Second ed Prentice Hall Pattern Classification A Wiley Interscience Publication New York Wiley Neural Networks A comprehensive Foundation Second Edition Prentice Hall A practical application of simulated annealing to clustering Pattern Recognition 25 4 401 412 Probabilistic connectionist approaches for the design of good communication codes Proc of the ISCNN Japan Approximations by superpositions of sigmoidal functions Mathematics of Control Signals and Systems 2 303 3 14 no 4 pp 303 314 Program Author Ronald Fisher Bishop C M Bishop C M Svensen M amp Williams C K I Dunham M D Abrusco R et al Hey T et al Russell S Norvig P Duda R O Hart P E Stork D G Haykin S Donald E Brown D E Huntley C L Babu G P Murty M N Cybenko G Tab 4 Reference Documents DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Date 1936 1995 1998 2002 2007 2009 2003 2001 1999 1991 1993 1989 40 ID AI A2 A3 A4 AS A6 A7 A8 A9 A10 AII AI2 A13 A14 A15 A
2. scale included 0 neurons num inputs activation function activation steepness 0 0 0 000000000000000000008 00 0 O 0 000000000000000000002 00 0 0 G 000000000000000000008 00 3 3 5 000000000000000000008 01 3 3 connections connected to neuron weight 0 5 39315938349584960938e 00 1 5 20419979095458984375e 00 2 2 09722948074340820312e 00 0 3 527216534042358398442 00 1 3 431213617324829101562 00 2 4 Lams renun axal cA 4 Blele Epochs Current error FANN FLO 2 1 1 0 2500820126 num layera 3 10 0 2473851591 learning rate 0 700000 Fig 13 The files csv left and mlp right output of the xorTrain experiment The file mlp_TRAIN_weights mlp contains the topology of the trained neural network and the weights of the connections between the network layers 4 1 2 Classification MLP Test use case The file mlp_TRAIN_weights mlp can be copied into the input file area File Manager of the workspace in order to be re used in future experiments for example in this case the test use case This is because it represents the stored brain of the network trained to calculate the XOR function v File Manager Workspace imipExp C Dow Edit File Type Last Access xX Dele B mip_TRAIN_weights mip 2011 05 30 x el xOr csv csv 2011 05 30 x 3 xor_run csv csv 2011 05 30 x v My Experiments Workspace mipExp Experiment Status Last Access gt Delete 4 xorTrain ended 2011 05 30 x
3. we obtain the output files of the experiment in which both training and test outputs are present 37 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program V My Experiments orkspace mipExp Experiment Status Last Access gt Delete b xorTrain ended 2011 05 30 x b xorTest ended 2011 05 30 X 4 xorFull ended 2011 05 30 x lt ta Download AddinwS File Type Description B mip_FULL_output csv csv output and target vector of the test set DI mip_FULL_errorPlot jpeg jpeg scatter plot of the epochs vs error E gt mip_FULL_error csv csv epoch error file et mip_FULL log txt log file E gt mip_FULL_tmp_weights mip _nettmp file Fig 18 The xorFull experiment output 38 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program ve ey Ona Nae 5 Appendix References and Acronyms Abbreviations amp Acronyms A amp A Meaning A amp A Meaning Al Artificial Intelligence KDD Knowledge Discovery in Databases ANN Artificial Neural Network IEEE Institute of Electrical and Electronic Engineers ARFF Attribute Relation File Format INAF Istituto Nazionale di Astrofisica ASCII American Standard Code for JPEG Joint Photographic Experts Group Information Interchange Bok Base
4. FITS extension fits tabular fits files VOTABLE extension votable formatted files containing special fields separated by keywords coming from XML language with more special keywords defined by VO data standards For training and test cases a correct dataset file must contain both input and target features columns with input type as the first group and target type as the final group E orea 4 OD 0 0 0 0 12 1 1 0 Fig 7 The content of the xor csv file used as input for training test use cases As shown in Fig 7 the xor csv file for training test uses cases has 4 patterns rows of 2 input features first two columns and one target feature third column The target feature is not an input information but the desired output to be used in the comparison calculation of the error with the model output during a training test experiment Fig 8 The content of the xor_run csv file used as input for Run use case In Fig 8 the xor_run csv file is shown valid only for Run use case experiments It is the same of xor csv except for the target column that is not present This file can be also generated by the user starting from the xor csv As detailed in the GUI user Guide A19 the user may in fact use the Dataset Editor options of the webapp to manipulate and build datasets starting from uploaded data files 17 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of D
5. This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration inmate Program E p t x y 15 t As before starting from the likelihood function by taking the negative logarithm we obtain an error function of the form E gt f Iny 16 a t For 1 of C coding scheme the minimum value of the error function 16 equals 0 But the error function is still valid when t is a continuous variable in the range 0 1 representing the probability that x belongs to Cx To get the proper target variable the softmax activation function is used So for the cross entropy error function for multiple classes equation 16 to be efficient the softmax activation function must be used By evaluating the derivatives of the softmax error function considering all inputs to all output units for pattern n it can be obtained ar 17 Vi fi Ca which is the same result as found for the two class cross entropy error with a logistic activation function equation 11 The same result is valid for the sum of squares error with a linear activation function This can be considered as an additional proof that there is a natural pairing of error function and activation function 2 2 The Back Propagation learning rule For better understanding the back propagation learning algorithm can be divided into two phases propagation and weight update Phase 1 Propa
6. achieved if we consider a target coding scheme for which t if the input vector belongs to class C and t 0 if it belongs to class C2 We can combine these into a single expression so that the probability of observing either target value 1s p t xj y 1 y 5 This equation is the equation for a binomial distribution known as Bernoulli distribution With this interpretation of the output unit activations the likelihood of observing the training data set assuming the data points are drawn independently from this distribution is then given by Hon a y 6 By minimizing the negative logarithm of the likelihood we get to the cross entropy error function Hopfield 1987 Baum and Wilczek 1988 Solla et al 1988 Hinton 1989 Hampshire and Pearlmutter 1990 in the form E gt ifny 1 ln 1 y 7 Let s consider some elementary properties of this error function Differentiating this error function with respect to y we obtain GE _ y 8 Gy y 1 y The absolute minimum of the error function occurs when v Wn 9 The considering network has one output whose value is to be interpreted as a probability so it is appropriate to consider the logistic sigmoid activation function equation 2 which has the property g a g a 1 g a 10 Combining equations 8 and 10 we see that the derivative of the error with respect to a takes a simple form 10 DAMEWARE Beta Release MLP BP Model User Manual This
7. by an asterisk are considered required In all other cases the fields can be left empty default values are used 3 7 1 Regression with MLP Full Parameter Specifications In the case of Regression_MLP with Full use case the help page is at the address http dame dsf unina it mlp_help html regr_ full e Train Set this parameter is a field required This is the dataset file to be used as input for the learning phase of the model It typically must include both input and target columns where each row is an entire pattern or sample of data The format hence its extension must be one of the types allowed by the application ASCII FITS CSV VOTABLE More specifically take in mind the following simple rule the sum of input and output nodes MUST be equal to the total number of the columns in this file e Validation Set 26 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration wy A ewe Craw NE P ro g ram This is the dataset file to be used as input for the validation of the learning phase of the model It typically must include both input and target columns where each row is an entire pattern or sample of data The format hence its extension must be one of the types allowed by the application ASCII FITS CSV VOTABLE If users leaves empty this parameter field the validation phase of the training
8. hidden layer vs double multiple hidden layer o single is good for any approximation of continuous function o double may be good some times Problem specific reason of more layers o Each layer learns different aspects 15 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration ieee Program 3 Use of the web application model The Multi Layer Perceptron MLP is one of the most common supervised neural architectures used in many application fields It 1s especially related to classification and regression problems and in DAME it is designed to be associated with such two functionality domains The description of these two functionalities is reported in the Reference Manual A18 available from webapp menu or from the beta intro web page In the following are described practical information to configure the network architecture and the learning algorithm in order to launch and execute science cases and experiments 3 1 Use Cases For the user the MLP with BP system offers four use cases Train Test Run Full As described in A19 a supervised machine learning model like MLP requires different use cases well ordered in terms of execution sequence A typical complete experiment with this kind of models consists in the following steps 1 Train the network with a dataset as input containing both input and target f
9. iL 14 so that the error function depends on the relative errors of the network outputs Knowing that the sum of squares error function depends on the squares of the absolute errors we can make comparisons Minimization of the cross entropy error function will tend to result in similar relative errors on both small and large target values By contrast the sum of squares error function tends to give similar absolute errors for each pattern and will give large relative errors for small output values This result suggests the better functionality of the cross entropy error function over the sum of squares error function at estimating small probabilities Another advantage over the sum of squares error function is that the cross entropy error function gives much stronger weight to smaller errors 2 1 1 3 Cross Entropy for the multiple class case Let s return to the classification problem involving mutually exclusive classes where the number of classes is greater than two For this problem we should seek the form which the error function should take The network now has one output yx for each class and target data which has a 1 of c coding scheme so that we have tg b for a pattern n from class C The probability of observing the set of target values t by given an input vector x is just P C lx y Therefore the conditional distribution for this pattern can he written as 11 DAMEWARE Beta Release MLP BP Model User Manual
10. of Knowledge LAR Layered Application Architecture BP Back Propagation MDS Massive Data Sets BLL Business Logic Layer MLP Multi Layer Perceptron CE Cross Entropy MSE Mean Square Error CSV Comma Separated Values NN Neural Network DAL Data Access Layer OAC Osservatorio Astronomico di Capodimonte DAME DAta Mining amp Exploration re Personal Computer DAPL Data Access amp Process Layer PI Principal Investigator DL Data Layer REDB Registry amp Database DM Data Mining RIA Rich Internet Application DMM Data Mining Model SDSS Sloan Digital Sky Survey DMS Data Mining Suite SL Service Layer FITS Flexible Image Transport System SW Software FL Frontend Layer UI User Interface FW FrameW ork URI Uniform Resource Indicator GRID Global Resource Information Database VO Virtual Observatory GUI Graphical User Interface XML eXtensible Markup Language HW Hardware Tab 3 Abbreviations and acronyms 39 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved ue Red Ae secre aw wt Reference amp Applicable Documents ID RI R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 Title Code The Use of Multiple Measurements in Taxonomic Problems in Annals of Eugenics 7 p 179 188 Neural Networks University Press GB for Pattern Recognition Oxford Neural Computation Data Mining Introductory and Advanced Topics Prentice Hall
11. results is omitted Test Set this parameter is a field required Dataset file as input It is a file containing input and target columns It must have the same number of input and target columns as for the training input file For example it could be the same dataset file used as the training input file Network File It is a file generated by the model during training phase It contains the resulting network topology as stored at the end of a training session Usually this file should not be edited or modified by users just to preserve its content as generated by the model itself The extension of such a file is usually mlp The canonical use of this file in this use case is to resume a previous training phase in order to try to improve it If users leaves empty this parameter field by default the current training session starts from scratch number of input nodes this parameter is a field required It is the number of neurons at the first input layer of the network It must exactly correspond to the number of input columns in the dataset input file Training File field except the target columns number of nodes for hidden layer this parameter is a field required It is the number of neurons of the unique hidden layer of the network As suggestion this should be selected in a range between a minimum of 1 5 times the number of input nodes and a maximum of 2 times 1 the number of input nodes number of output nodes 21
12. the training input file Network File this parameter is a field required It is a file generated by the model during training phase It contains the resulting network topology as stored at the end of a training session Usually this file should not be edited or modified by users just to preserve its content as generated by the model itself The extension of such a file is usually mlp Classification with MLP Test Parameter Specifications In the case of Classification_MLP with Test use case the help page is at the address http dame dsf unina it mlp_help html class_test Test Set this parameter is a field required Dataset file as input It is a file containing input and target columns It must have the same number of input and target columns as for the training input file For example it could be the same dataset file used as the training input file Network File this parameter is a field required It is a file generated by the model during training phase It contains the resulting network topology as stored at the end of a training session Usually this file should not be edited or modified by users just to preserve its content as generated by the model itself The extension of such a file is usually mlp 24 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration LIZA TALI Pro gram 3 6 Run Use ca
13. 16 A17 A18 A19 ve ey Ona Nae Title Code SuiteDesign VONEURAL PDD NA 0001 Rel2 0 project_plan_ VONEURAL PLA NA 0001 Rel2 0 statement_of_work_VONEURAL SOW NA 0001 Rel1 0 MLP_user_manual_VONEURAL MAN NA 0001 Rel1 0 pipeline_test_VONEURAL PRO NA 0001 Rel 1 0 scientific_example_VONEURAL PRO NA 0002 Rel 1 1 frontend_VONEURAL SDD NA 0004 Rel1 4 FW_VONEURAL SDD NA 0005 Rel2 0 REDB_VONEURAL SDD NA 0006 Rel1 5 driver VONEURAL SDD NA 0007 Rel0 6 dm model VONEURAL SDD NA 0008 Rel2 0 ConfusionMatrixLib_VONEURAL SPE NA 0001 Rel1 0 softmax_entropy_ VONEURAL SPE NA 0004 Rel1 0 VONeuralMLP2 0_VONEURAL SPE NA 0007 Rel1 0 dm_model VONEURAL SRS NA 0005 Rel0 4 FANN_MLP_VONEURAL TRE NA 0011 Rel1 0 DMPlugins DAME TRE NA 0016 Rel0 3 BetaRelease_ReferenceGuide DAME MAN NA 0009 Rel1 0 BetaRelease_ GUL UserManual DAME MAN NA 0010 Rell 0 Program Author DAME Working Group Brescia Brescia DAME Working Group D Abrusco D Abrusco Cavuoti Manna Fiore Nocella d Angelo Cavuoti Di Guido Cavuoti Skordovski Skordovski Cavuoti Skordovski Laurino Di Guido Brescia Brescia Brescia Tab 5 Applicable Documents DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Date 15 10 2008 19 02 2008 30 05 2007 12 10 2007 17 07 2007 06 10 2007 18 03 2009 14 04 2010 29 03 2010 03 06 2009 22 03 2010 07 0
14. 7 2007 02 10 2007 20 02 2008 05 01 2009 30 11 2008 14 04 2010 28 10 2010 03 12 2010 41 DAta Mining amp Exploration Program 000 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved 42 DAta Mining amp Exploration Program DAME Program we make science discovery happen 43 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved
15. AME project Board All Rights Reserved DAta Mining amp Exploration LIZA ZALI Program 3 3 Output In terms of output different files are obtained depending on the specific use case of the experiment In the case of regression functionality the following output files are obtained in all use cases Mip_TRAIN tmp_we ghts Mip TESTiog Mip FULL wainOupatew PMip TRAIN weights Mp FULL tmp weights Mip TRAIN errorPlotjpeg Mp FULL weights CC Mp FULL ope DD IF Mp FULT erorPlot jpeg IF Mi FUE outpuiPlot jpeg Tab 1 output file list in case of regression type experiments In the case of classification functionality the following output files are obtained in all use cases Mp TRAIN mp weighs MIp TESTlog MIp FULL_wainOupatew MIp_TRAIN_ weights P Mp FULL_tmp_weights Mip_TRAIN_errorPlotjpeg Mip_FULL_weeights SS i __ Mp FULL_outputcsv SS po Mp FULLerrorPlotjpeg S YI SE Mlp_FULL_confusionMatrix Tab 2 output file list in case of classification type experiments 18 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved mn WARE ag Pi TERE or 3 4 TRAIN Use case DAta Mining amp Exploration Program In the use case named Train the software provides the possibility to train the MLP The user will be able to use new or existing already trained MLP weight configurations adjust pa
16. DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration eg eect eee Pro gram this parameter is a field required It is the number of neurons in the output layer of the network It must correspon d to the number of target columns as contained in the dataset input file filed Training File number of iterations This the maximum number of learning iterations It is one of the two stopping criteria for the learning algorithm It is suggested to put an high value in order to be sure to reach the best training results User should use this value in combination with the error tolerance parameter If left empty the default value is 1000 error tolerance This is the threshold of the learning loop This is one of the two stopping criteria of the algorithm Use this parameter in combination with the number of iterations If left empty its default is 0 001 training mode This is the combination of two parameters training error evaluation criterion the submission rule of the dataset to the model The possible criterion for the training error evaluation is MSE Mean Square Error The two possible submission rules are Batch where the learning error is evaluated after each entire data pattern set calculation andIncremental also known as on line where the error is evaluated after a single pattern submission to the net
17. DAta Mining amp Exploration Program s i h E Dipartimento di Scienze Fisiche A en ioni E clAsTROF Sica F di CALTECH b OR Universita di Napoli Federico H Pre OSSERVATORIO ASTRONOMICO di CAPODIMONTE I 2 DA AVES Multi Layer Perceptron with Back Propagation User Manual DAME MAN NA 0011 Issue 1 3 Date September 03 2013 Author S Cavuoti M Brescia Doc MLPBP UserManual_DAME MAN NA 0011 Rel1 3 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program INDEX E OCC HOG RR AA 4 2 MIEPMOdeER CORE AO I secrcccuaacessetnnceoarssucinncndpecnesesudcisasstna dossi ine sbarcano assetato 5 2A VITA Ver PET CC polizia E vaga diocesi 5 2 1 1 The training performance evaluation criteria untiin i cinsi 9 2 1 1 1 Tue Mean Sguare EON PARRA CRE RR O ER RO ETRE 9 2 1 1 2 Cross Entropy for the two class Case M 9 O29 Wen es Cross Entropy for the multiple class Case 11 22 Ae Back Propacation annie ai 12 Zi MLP PCi Ca RUE cen E E EE 14 2 3 1 Selection of neuron activation function 14 23 2 SCdli g putand tarset VA s xc cee sncsencsssceeandensscoedadsceseseaniswosienk dhacn beet edeedadecuonvaniienseabdbaenbindsdesdates 14 233 Number OF bidden OCC PRA REA RR sont E EE EEDE Rea 15 Zod Nomb or Midde
18. Download AddihnWws File Type Description ge A dI mip_TRAIN_errorPiot jpeg jpeg scatter piot of the epochs vs error mip_TRAIN_error csv csv epoch error file I MIP_TRAIN log bt log fie j Dl mip_TRAIN_tmp_weights mip nettmpfile el ra n mip_TRAIN_weights mip trained network file Da Fig 14 The file mlp_TRAIN_ weights m p copied in the WS input file area for next purposes 35 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program So far we proceed to create a new experiment named xorTest to verify the training of the network For simplicity we will re use the same input dataset file xor csv but in general the user could use another dataset uploaded from scratch or extracted from the original training dataset through file editing options AME Application User brescia oacn inaf it App Manuals v Model Manuals v Cloud Servi RESOURCE MANAGER Upload in mipExp Workspace Experiment Setup Workspace mipExp Select a Running Fest mn Experiment xorTest Mode Fam a Classification_MLP v unclionaliy Field is Required Test Set xor csv x Network File mip_TRAIN_weights v Submit Fig 15 The xorTest experiment configuration tab note weights file inserted After execution the experiment xorTest will show the output files available v My Exp
19. P Run Parameter Specifications In the case of Classification MLP with Run use case the help page is at the address http dame dsf unina it mlp_help html class_ run e Run Set this parameter is a field required 25 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved WARE wy Re nn ne hal It is a file containing just input columns NOT TARGET DAta Mining amp Exploration Program It must have the same number of input columns as for the training input file For example it could be the same dataset file used as the training input file without the last target columns e Network File this parameter is a field required It is a file generated by the model during training phase It contains the resulting network topology as stored at the end of a training session Usually this file should not be edited or modified by users just to preserve its content as generated by the model itself The extension of such a file is usually mlp 3 7 Full Use case In the use case named Full the software provides the possibility to perform a complete sequence of train test and run cases with the MLP In the experiment configuration there is also the Help button redirecting to a web page dedicated to support the user with deep information about all parameters and their default values We remark that all parameters labeled
20. ameter field the validation phase of the training results is omitted Network File It is a file generated by the model during training phase It contains the resulting network topology as stored at the end of a training session Usually this file should not be edited or 21 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration I COLI Sa NE P ro g ram modified by users just to preserve its content as generated by the model itself The extension of such a file is usually mlp The canonical use of this file in this use case is to resume a previous training phase in order to try to improve it If users leaves empty this parameter field by default the current training session starts from scratch number of input nodes this parameter is a field required It is the number of neurons at the first input layer of the network It must exactly correspond to the number of input columns in the dataset input file Training File field except the target columns number of nodes for hidden layer this parameter is a field required It is the number of neurons of the unique hidden layer of the network As suggestion this should be selected in a range between a minimum of 1 5 times the number of input nodes and a maximum of 2 times 1 the number of input nodes number of output nodes this parameter is a field requir
21. be trained more quickly than networks created from nodes with different activation and deactivation values SLPs are only capable of learning linearly separable patterns In 1969 in a famous monograph entitled Perceptrons Marvin Minsky and Seymour Papert showed that it was impossible for a single layer Perceptron network to learn an XOR function Although a single threshold unit is quite limited in its computational power it has been shown that networks of parallel threshold units can approximate any continuous function from a compact interval of the real numbers into the interval 1 1 So far it was introduced the model Multi Layer Perceptron DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved WARE 60 Wai DAta Mining amp Exploration Program oul input output hidden Fig 4 A MLP able to calculate the logic XOR operation This class of networks consists of multiple layers of computational units usually interconnected in a feed forward way Each neuron in one layer has directed connections to the neurons of the subsequent layer In many applications the units of these networks apply a continuous activation function The number of hidden layers represents the degree of the complexity achieved for the energy solution space in which the network output moves looking for the best solution As an example in a typical classification pr
22. del User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program 4 1 1 Classification MLP Train use case Let suppose we create an experiment named xorTrain and we want to configure it After creation the new configuration tab is open Here we select Classification_MLP as couple functionality model of the current experiment and we select also Train as use case RESOURCE MANAGER Experiment Setup Ej T sa Se lina A SAC Classification MLF ma Selecta Running Train Functionality Mte Field is Required Train Sef xor cay Validation Set we Network File e number of input nodes 2 number of modes for hidden layer 2 number of output nodes il number of teratione stror toleranca training mode Submit Fig 10 The xorTrain experiment configuration tab Now we have to configure parameters for the experiment In particular we will leave empty the not required fields labels without asterisk The meaning of the parameters for this use case are described in previous sections of this document As alternative you can click on the Help button to obtain detailed parameter description and their default values directly from the webapp We give xor csv as training dataset specifying e Number of input nodes 2 because 2 are the input columns in the file e Number of hidden nodes first level 2 as minimal
23. document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program aw Uey Add eee a This equation gives the error quantity which is back propagated through the network in order to compute the derivates of the error function with respect to the network weights The same equation form can be obtained for the sum of squares error function and linear output units This shows that there is a natural paring of error function and output unit activation function From the equations 7 and 9 the value of the cross entropy error function at its minimum is given by Epa t int 1 In 1 12 The last equation becomes zero for 1 0f C coding scheme However when is a continuous variable in the range 0 1 representing the probability of the input vector belonging to class C the error function 7 is also the correct one to use In this case the minimum value 12 of the error does not become 0 In this case it is appropriate by subtracting this value from the original error function to get a modified error function of the form SE _ e 11 sd eee A ll E zia pil e l T 13 But before moving to cross entropy for multiple classes let us describe more in detail its properties Assume the network output for a particular pattern n written in the form y t e Then the cross entropy error function 13 can be transformed to the form E L rial 1 F
24. e error is evaluated after a single pattern submission to the network See the user manual for details 30 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved Program The following are the possible choices MSE Batch MSE Incremental CE Batch CE Incremental O O O A U NB If left empty the default is the first one MSE Batch DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration 31 DAta Mining amp Exploration inmates Program 4 Examples This section is dedicated to show some practical examples of the correct use of the web application Not all aspects and available options are reported but a significant sample of features useful for beginners of DAME suite and with a poor experience about data mining methodologies with machine learning algorithms In order to do so very simple and trivial problems will be described Further complex examples will be integrated here in the next releases of the documentation 4 1 Classification XOR problem The problem can be stated as follows we want to train a model to learn the logical XOR function between two binary variables As known the XOR problem is not a linearly separable problem so we require to obtain a neural network able to learn to ide
25. eatures then store as output the final weight matrix best configuration of network weights 2 Test the trained network in order to verify training quality it is also included the validation step available for some models The same training dataset or a mix with new patterns can be used as input 3 Run the trained and tested network with datasets containing ONLY input features without target ones In this case new or different input data are encouraged because the Run use case implies to simply execute the model like a generic static function The Full use case includes both train and test previous cases It can be executed as an alternative to the sequence of the two use cases In this sense it is not to be considered as a single step of the sequence 16 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration inmates Program 3 2 Input We also remark that massive datasets to be used in the various use cases are and sometimes must be different in terms of internal file content representation Remind that in all DAME models it is possible to use one of the following data types e ASCII extension dat or txt simple text file containing rows patterns and columns features separated by spaces normally without header e CSV extension csv Comma Separated Values files where columns are separated by commas
26. ed It is the number of neurons in the output layer of the network It must correspon d to the number of target columns as contained in the dataset input file filed Training File number of iterations This the maximum number of learning iterations It is one of the two stopping criteria for the learning algorithm It is suggested to put an high value in order to be sure to reach the best training results User should use this value in combination with the error tolerance parameter If left empty the default value is 1000 error tolerance This is the threshold of the learning loop This is one of the two stopping criteria of the algorithm Use this parameter in combination with the number of iterations If left empty its default is 0 001 training mode 22 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program This is the combination of two parameters training error evaluation criterion the submission rule of the dataset to the model The two possible criteria for the training error evaluation are MSE Mean Square Error and CE Cross Entropy The two possible submission rules are Batch where the learning error is evaluated after each entire data pattern set calculation andincremental also known as on line where the error is evaluated after a single pattern submission to the network See the u
27. eriments Workspace mipExp Experiment Status Last Access gt Delete gt xorTrain ended 2011 05 30 x lt 4 xorTest ended 2011 05 30 gt 4 i Download F gt AddinWS File Type Description B mip_TEST_output csv csv output and target vector of the test set et mip_TEST log txt log file E gt mip_TEST_confusionMatrix tt confusion matrix B gt MLP_Test_params xmi xml Experiment Configuration File Fig 16 The xorTest experiment output files 36 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program 4 1 3 Classification MLP Full use case If an automatic sequence of train and test use cases is desired it is possible to execute an experiment by choosing Full as use case In this case we create a new experiment named xorFull where we have to select parameters for both train and test use cases RESOURCE MANAGER Experiment Setup O Select al Select a Running filmes Classification _MLP Full Functionality Mode a Field is Required Train Sef xorcsy ia Validation Set w Test Sef X07 cov Network File EN number ofinoputnoades 2 number of nodes for hidden layer 2 number of output nodes tl numberof iteratione error tolerance training mode submit Fig 17 The xorFull experiment configuration tab At the end
28. ers are more prone to getting caught in undesirable local minima Astronomical data do not seem to require such level of complexity and therefore it is enough to use just a double weights layer 1 e a single hidden layer What is different in such a neural network architecture is typically the learning algorithm used to train the network It exists a dichotomy between supervised and unsupervised learning methods In the first case the network must be firstly trained training phase in which the input patterns are submitted to the network as couples input desired known output The feed forward algorithm is then achieved and at the end of the input submission the network output is compared with the corresponding desired output in order to quantify the learning quote It is possible to perform the comparison in a batch way after an entire input pattern set submission or incremental the comparison is done after each input pattern submission and also the metric used for the distance measure between desired and obtained outputs can be chosen accordingly problem specific requirements usually the euclidean distance is used After each comparison and until a desired error distance is unreached typically the error tolerance is a pre calculated value or a constant imposed by the user the weights of hidden layers must be changed accordingly to a particular law or learning technique DAMEWARE Beta Release MLP BP Model User Manual This document con
29. fic problem and related data to be explored to select the use cases to configure internal parameters to launch experiments and to evaluate results The documentation package consists also of a general reference manual on the webapp useful also to understand what we intend for association between functionality and data mining model and a GUI user guide providing detailed description on how to use all GUI features and options So far we strongly suggest to read these two manuals and to take a little bit of practical experience with the webapp interface before to explore specific model features by reading this and the other model guides All the cited documentation package is available from the address http dame dsf unina it dameware html where there is also the direct gateway to the beta webapp As general suggestion the only effort required to the end user is to have a bit of faith in Artificial Intelligence and a little amount of patience to learn basic principles of its models and strategies By merging for fun two famous commercial taglines we say Think different Just do it casually this is an example of data text mining DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program 2 MLP Model Theoretical Overview This paragraph is intended to furnish a theoretical overview of the MLP model a
30. gation Each propagation involves the following steps 1 Forward propagation of a training pattern s input through the neural network in order to generate the propagation s output activations 2 Back propagation of the propagation s output activations through the neural network using the training pattern s target in order to generate the deltas of all output and hidden neurons Phase 2 Weight update For each weight synapse 1 Multiply its output delta and input activation to get the gradient of the weight 2 Bring the weight in the opposite direction of the gradient by subtracting a ratio of it from the weight 12 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program This ratio influences the speed and quality of learning it is called the learning rate The sign of the gradient of a weight indicates where the error is increasing this is why the weight must be updated in the opposite direction Repeat the phase 1 and 2 until the performance of the network is good enough Output error Stoppina threshold Nea nl MALATI oa Pl yy SENT Ph y Pe lt W Forward C SEAL ci Wao rari Pi ati Dial hase with GS o ahs e the SUE Le UR i Backward ropaaation forward t BA ion phase with of the input GAS C yoy we scenici patterns E x Propagation throuah the rei mmpn
31. he xorTrain experiment output files cccccccccccccccccscssessccccccccccassssseecceeeeauaeeseecceeeeaaaseeeeeeeessaaaaneeeeees 34 Fig 13 The files csv left and mlp right output of the xorTrain experiment i 35 Fig 14 The file mlp_TRAIN_weights mlp copied in the WS input file area for next purposes 35 Fig 15 The xorTest experiment configuration tab note weights file inserted M 36 Fig 16 The xorTest experiment output files iii 36 Fig 17 The xorFull experiment configuration tab iii 37 Fio dlo Ihe xor ull experimenti WD cv 38 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved mn ai WARE 8 5 gr ATTO ene Owe ae 1 Introduction DAta Mining amp Exploration Program he present document is the user guide of the data mining model MLP Multi Layer Perceptron trained by Back Propagation as implemented and integrated into the DAMEWARE This manual is one of the specific guides one for each data mining model available in the webapp having the main scope to help user to understand theoretical aspects of the model to make decisions about its practical use in problem solving cases and to use it to perform experiments through the webapp by also being able to select the right functionality associated to the model based upon the speci
32. here each row is an entire pattern or sample of data The format hence its extension must be one of the types allowed by the application ASCII FITS CSV VOTABLE If users leaves empty this parameter field the validation phase of the training results is omitted e Network File It is a file generated by the model during training phase It contains the resulting network topology as stored at the end of a training session Usually this file should not be edited or modified by users just to preserve its content as generated by the model itself The extension of such a file is usually mlp 19 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration wy A ewe Craw NE P ro g ram The canonical use of this file in this use case is to resume a previous training phase in order to try to improve it If users leaves empty this parameter field by default the current training session starts from scratch number of input nodes this parameter is a field required It is the number of neurons at the first input layer of the network It must exactly correspond to the number of input columns in the dataset input file Training File field except the target columns number of nodes for hidden layer this parameter is a field required It is the number of neurons of the unique hidden layer of the network As suggestion t
33. his should be selected in a range between a minimum of 1 5 times the number of input nodes and a maximum of 2 times 1 the number of input nodes number of output nodes this parameter is a field required It is the number of neurons in the output layer of the network It must correspon d to the number of target columns as contained in the dataset input file filed Training File number of iterations This the maximum number of learning iterations It is one of the two stopping criteria for the learning algorithm It is suggested to put an high value in order to be sure to reach the best training results User should use this value in combination with the error tolerance parameter If left empty the default value is 1000 error tolerance This is the threshold of the learning loop This is one of the two stopping criteria of the algorithm Use this parameter in combination with the number of iterations If left empty its default is 0 001 training mode This is the combination of two parameters training error evaluation criterion the submission rule of the dataset to the model 20 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration I COLI Craw NE P ro g ram The possible criterion for the training error evaluation is MSE Mean Square Error The two possible submission rules are Batch where the lea
34. iahoadensasbatowedsasadesadeneeuaientnessbasics 39 TABLE INDEX Tab I output file list in case of regression type experiments iii 18 Tab 2 output file list in case of classification type experiments ii 18 Tab 3 Abbreviations and Acronyms iiii 39 004 40 1b 9 SACI SOO CUI IIIS siressa eee e ao 41 2 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program FIGURE INDEX Fig 1 the MLP artificial and biologic brains iii 5 IAU RR REA 6 Fig 3 Example of a SLP to calculate the logic AND operation iii 7 Fig 4 A MLP able to calculate the logic XOR operation iii S Fig 5 The typical flow structure of the Back Propagation algorithm iii 13 Fig 6 The sigmoid function and its first derivative iii 14 Fig 7 The content of the xor csv file used as input for training test USC CASES 17 Fig 8 The content of the xor_run csv file used as input for Run USC case ii 17 Fig 9 The starting point with a Workspace mlpExp created and two data files uploaded 32 Fig 10 The xorTrain experiment configuration tab iii 33 Fig 11 The xorTrain experiment status after SUDMISSION ii 34 Fig 12 T
35. its first derivative 2 3 2 Scaling input and target values e Standardize o Large scale difference error depends mostly on large scale feature o Shifted to Zero mean unit variance Need to be done once before training 14 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved 2 3 3 2 3 4 DAta Mining amp Exploration Program Need full data set Target value o Output is saturated In the training the output never reach saturated value e Full training never terminated o Range 1 1 1s suggested Number of hidden nodes Number of hidden units governs the expressive power of net and the complexity of decision boundar y Well separated gt fewer hidden nodes From complicated density highly interspersed gt many hidden nodes Heuristics rule of thumb Use a minimum of 2N 1 neurons of the first hidden layer N is the number of input nodes More training data yields better result Number of weights lt number of training data Number of weights number of training data 10 Adjust number of weights in response to the training data Start with a large number of hidden nodes then decay prune weights OOO Number of hidden layers One or two hidden layers are OK so long as differentiable activation function o But one layer is generally sufficient More layers gt more chance of local minima Single
36. m of squares error function is not the most appropriate choice In the case of a 1 of C coding scheme the target values sum to unity for each pattern and so the network outputs will also always sum to unity However there is no guarantee that they will lie in the range 0 1 In fact the outputs of the network trained by minimizing a sum of squares error function approximate the posterior probabilities of class membership conditioned on the input vector using the maximum likelihood principle by assuming that the target data was generated from a smooth deterministic function with added 9 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program Gaussian noise For classification problems however the targets are binary variables and hence far from having a Gaussian distribution so their description cannot be given by using Gaussian noise model Therefore a more appropriate choice of error function is needed Let us now consider problems involving two classes One approach to such problems would be to use a network with two output units one for each class First let s discuss an alternative approach in which we consider a network with a single output y We would like the value of y to represent the posterior probability P C lx for class C The posterior probability of class C will then given by P C lx 1 y This can be
37. ment output files The content of output files obtained at the end of the experiment available when the status is ended is shown in the following The file mlp_TRAIN_error csv reports the training error after a set of iterations indicated in the first column 34 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program 20 0 1750848889 30 0 1167778075 40 0 0464941077 50 0 0160067063 60 0 0066382927 70 0 0009171068 connection rate 1 000000 network type 0 learning momentum 0 000000 training algorithm 2 train error function 1 train stop function 0 cascade output change fraction 0 010000 quickprop_decay 0 000100 quickprop_mu 1 750000 rprop_ increase factor 1 200000 rprop_decrease factor 0 500000 rprop_delta min 0 000000 rprop delta max 50 000000 rprop delta zero 0 100000 cascade output_stagnation epochs 12 cascade candidate change fraction 0 010000 cascade candidate stagnation epochs 12 cascade max out epochs 150 cascade max cand epochs 150 cascade num candidate groups 2 bir fail limit 3 49939394039535522461e 01 cascade candidate limit 1 000000000000000000008 03 cascade weight multiplier 4 00000005960464477539e 01 cascade activation_functions_ count 10 cascade activation functions 3 5 7 8 10 11 14 15 16 17 cascade activation_steepnesses_count 4 layer sizes 3 3 2
38. n the softmax function 13 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program ue Red Ae secre aw wt 2 3 MLP Practical Rules The practice and expertise in the machine learning models such as MLP are important factors coming from a long training and experience within their use in scientific experiments The speed and effectiveness of the results strongly depend on these factors Unfortunately there are no magic ways to a priori indicate the best configuration of internal parameters involving network topology and learning algorithm But in some cases a set of practical rules to define best choices can be taken into account 2 3 1 Selection of neuron activation function e If there are good reasons to select a particular activation function then do it o linear o threshold o Hyperbolic tangent o sigmoid e General good properties of activation function o Non linear o Saturate some max and min value o Continuity and smooth o Monotonicity convenient but nonessential o Linearity for a small value of net e Sigmoid function has all the good properties o Centered at zero Anti symmetric f net f net Faster learning Overall range and slope are not important O DUO O finet i f net net Fig 6 The sigmoid function and
39. n ayei See E E 15 3 Use otthe web app iC All OM model serere ria dt 16 tl Ulss 16 III RR IO 17 I UP Oa I 18 Si TRANU OIE nni aaa ia A e renna 19 3 4 1 Regression with MLP Train Parameter Specifications i 19 3 4 2 Classification with MLP Train Parameter Specifications 21 S aes ae aati E E E sae tutecsesatetaseucuaseneeucdataestaeecsenanes 23 3 5 1 Regression with MLP Test Parameter Specifications ccccccccccccccssssssssseesseeeeeeceeeeseeeeeaeaas 23 3 5 2 Classification with MLP Test Parameter Specifications 24 OE A 25 3 6 1 Regression with MLP Run Parameter Specifications i 25 3 6 2 Classification with MLP Run Parameter Specifications 25 Ilie 26 3 7 1 Regression with MLP Full Parameter Specifications 26 3 7 2 Classification with MLP Full Parameter Specifications 28 4 0 CRE NA J Ak Classification ORIO EERE A e ada DZ 4 l L Classification MLP Vrain USE CAS 33 4 1 2 Classification MLP Test US61CaS 6 csi cccsceccsssswvctusvesessansdeveeadencystaswteuelas vecestass sneueteaayiicubeveieteres 35 4 1 3 Classification MLP Full use Case cc ssesssssssseeeceeccceeeeeeeeaaaaeeeesssseeeeeeceeeeeeeseeeeaaaaaeesssees 37 di Appendix Relerences and A CrOmy Mis yiccsisscizsasasacecaleadcasasbesiwadsaasdneaianaco
40. n of inputs must produce a positive value greater than b in order to push the classifier neuron over the 0 threshold Spatially the bias alters the position though not the orientation of the decision boundary The Perceptron learning algorithm does not terminate if the learning set is not linearly separable The Perceptron is considered the simplest kind of feed forward neural network The earliest kind of neural network is a Single Layer Perceptron SLP network which consists of a single layer of output nodes the inputs are fed directly to the outputs via a series of weights In this way it can be considered the simplest kind of feed forward network The sum of the products of the weights and the inputs is calculated in each node and if the value is above some threshold typically 0 the neuron fires and takes the activated value typically 1 otherwise it takes the deactivated value typically 1 gt out P x w x w 0 input output Fig 3 Example of a SLP to calculate the logic AND operation Neurons with this kind of activation function are also called artificial neurons or linear threshold units as described by Warren McCulloch and Walter Pitts in the 1940s A Perceptron can be created using any values for the activated and deactivated states as long as the threshold value lies between the two Most perceptrons have outputs of 1 or 1 with a threshold of 0 and there is some evidence that such networks can
41. ntify the right output value of the XOR function having a BoK made by possible combinations of two input variable and related correct output This is a very trivial problem and in principle it should not be needed any machine learning method But as remarked the scope is not to obtain a scientific benefit but to make practice with the web application Let say it is an example comparable with the classical print lt Hello World gt on standard output implementation problem for beginners in C language As first case we will use the MLP model associated to the Classification functionality The starting point is to create a new workspace named mlpExp and to populate it by uploading two files e xor csv CSV dataset file for training and test use cases e xor_run csv CSV dataset file for run use case Their content description is already described in section 3 of this document DAME Application App Manuals v Model Manuals v Cloud Services v Science Cases v Documents Info v RESOURCE MANAGER Workspace v File Manager Workspace New Workspace MLPexp Cg Dow ye Edit File Type Last Access b xor cev csv 2010 12 09 x lt gt es xOr_run csy csy 2010 12 09 x x n f Rename Workspace 3 Uplcad aj Experiment Delete 8 MLPexp Ee La x vV My Experiments No tems to show Fig 9 The starting point with a Workspace mlpExp created and two data files uploaded 32 DAMEWARE Beta Release MLP BP Mo
42. number of hidden nodes no particularly complex network brain is required to solve the XOR problem Anyway we suggest to try with different numbers of such nodes by gradually incrementing them to see what happens in terms of training error and convergence speed e Number of output nodes 1 because the third column in the input file is the target correct output for input patterns 33 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program RESOURCE MANAGER Workspace v File Manager Workspace New Workspace i L E Dow Edit File Type Last Access x Dsi Rename Workspace C Upload Gj Experiment Delete No items to show P moes B xX Note x i Experiment Finished oK l w_ My Experiments Workspace MLPexp Experiment Status Last Access M Delete b xorTrsin ended 2010 12 09 x Fig 11 The xorTrain experiment status after submission v My Experiments Workspace imipExp Experiment Status Last Access X Delete 4 xorTrain ended 2011 05 30 X i Download Addinws File Type Description mip_TRAIN_errorPiot jpeg jpeg scatter plot of the epochs vs error B amp B mip_TRAIN_error csv csv epoch error file mi 1 B E y mip_TRAIN_tmp_weights mip nettmp file L B gt mip_TRAIN_weights mip trained network file Fig 12 The xorTrain experi
43. oblem the number of hidden layers indicates the number of hyper planes used to split the parameter space i e number of possible classes in order to classify each input pattern The universal approximation theorem R12 for neural networks states that every continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a multi layer Perceptron with just one hidden layer This result holds only for restricted classes of activation functions e g for the sigmoidal functions An extension of the universal approximation theorem states that the two layers architecture is capable of universal approximation and a considerable number of papers have appeared in the literature discussing this property An important corollary of these results is that in the context of a classification problem networks with sigmoidal non linearity and two layer of weights can approximate any decision boundary to arbitrary accuracy Thus such networks also provide universal non linear discriminate functions More generally the capability of such networks to approximate general smooth functions allows them to model posterior probabilities of class membership Since two layers of weights suffice to implement any arbitrary function one would need special problem conditions or requirements to recommend the use of more than two layers Furthermore it is found empirically that networks with multiple hidden lay
44. on function represents a smooth version of the winner takes all activation model in which the unit with the largest input has output 1 while all other units have output 0 The softmax function is also used in the hidden layer of normalized radial basis function networks but in the interest of this document we would not enter into their description To use the softmax activation function you need at least 2 columns of targets 1 N codified The base of the MLP is the Perceptron a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt It can be seen as the simplest kind of feedforward neural network a linear classifier The Perceptron is a binary classifier which maps its input x a real valued vector to an output value f x a single binary value across the matrix DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program n x ite wee AS Acsuocese Gees nell 1 ifw x b gt 0 I x 0 else where wis a vector of real valued weights and W T is the dot product which computes a weighted sum b is the bias a constant term that does not depend on any input value The value of f x 0 or 1 is used to classify x as either a positive or a negative instance in the case of a binary classification problem If b is negative then the weighted combinatio
45. rameter field the validation phase of the training results is omitted Test Set this parameter is a field required Dataset file as input It is a file containing input and target columns It must have the same number of input and target columns as for the training input file For example it could be the same dataset file used as the training input file Network File It is a file generated by the model during training phase It contains the resulting network topology as stored at the end of a training session Usually this file should not be edited or modified by users just to preserve its content as generated by the model itself The extension of such a file is usually mlp The canonical use of this file in this use case is to resume a previous training phase in order to try to improve it If users leaves empty this parameter field by default the current training session starts from scratch number of input nodes this parameter is a field required 29 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program It is the number of neurons at the first input layer of the network It must exactly correspond to the number of input columns in the dataset input file Training File field except the target columns number of nodes for hidden layer this parameter is a field required It is the number of neu
46. rameters set training parameters set training dataset manipulate the training dataset and execute the training experiments There are several parameters to be set to achieve training dealing with network topology and learning algorithm In the experiment configuration there is also the Help button redirecting to a web page dedicated to support the user with deep information about all parameters and their default values We remark that all parameters labeled by an asterisk are considered required In all other cases the fields can be left empty default values are used 3 4 1 Regression with MLP Train Parameter Specifications In the case of Regression_MLP with Train use case the help page is at the address http dame dsf unina it mlp_help html regr_train e Train Set this parameter is a field required This is the dataset file to be used as input for the learning phase of the model It typically must include both input and target columns where each row is an entire pattern or sample of data The format hence its extension must be one of the types allowed by the application ASCII FITS CSV VOTABLE More specifically take in mind the following simple rule the sum of input and output nodes MUST be equal to the total number of the columns in this file e Validation Set This is the dataset file to be used as input for the validation of the learning phase of the model It typically must include both input and target columns w
47. rning error is evaluated after each entire data pattern set calculation andincremental also known as on line where the error is evaluated after a single pattern submission to the network See the user manual for details The following are the possible choices o 1 MSE Batch o 2 MSE Incremental If left empty the default is the first one MSE Batch 3 4 2 Classification with MLP Train Parameter Specifications In the case of Classification MLP with Train use case the help page is at the address http dame dsf unina it mlp_help html class_train Train Set this parameter is a field required This is the dataset file to be used as input for the learning phase of the model It typically must include both input and target columns where each row is an entire pattern or sample of data The format hence its extension must be one of the types allowed by the application ASCII FITS CSV VOTABLE More specifically take in mind the following simple rule the sum of input and output nodes MUST be equal to the total number of the columns in this file Validation Set This is the dataset file to be used as input for the validation of the learning phase of the model It typically must include both input and target columns where each row is an entire pattern or sample of data The format hence its extension must be one of the types allowed by the application ASCII FITS CSV VOTABLE If users leaves empty this par
48. rons of the unique hidden layer of the network As suggestion this should be selected in a range between a minimum of 1 5 times the number of input nodes and a maximum of 2 times 1 the number of input nodes number of output nodes this parameter is a field required It is the number of neurons in the output layer of the network It must correspon d to the number of target columns as contained in the dataset input file filed Training File number of iterations This the maximum number of learning iterations It is one of the two stopping criteria for the learning algorithm It is suggested to put an high value in order to be sure to reach the best training results User should use this value in combination with the error tolerance parameter If left empty the default value is 1000 error tolerance This is the threshold of the learning loop This is one of the two stopping criteria of the algorithm Use this parameter in combination with the number of iterations If left empty its default is 0 001 training mode This is the combination of two parameters training error evaluation criterion the submission rule of the dataset to the model The two possible criteria for the training error evaluation are MSE Mean Square Error and CE Cross Entropy The two possible submission rules are Batch where the learning error is evaluated after each entire data pattern set calculation andIncremental also known as on line where th
49. se In the use case named Run the software provides the possibility to run the MLP The user will be able to use already trained and tested MLP models their weight configurations to execute the normal experiments on new datasets In the experiment configuration there is also the Help button redirecting to a web page dedicated to support the user with deep information about all parameters and their default values We remark that all parameters labeled by an asterisk are considered required In all other cases the fields can be left empty default values are used 3 6 1 Regression with MLP Run Parameter Specifications In the case of Regression_MLP with Run use case the help page 1s at the address http dame dsf unina it mlp_help html regr_run e Run Set this parameter is a field required It is a file containing just input columns NOT TARGET It must have the same number of input columns as for the training input file For example it could be the same dataset file used as the training input file without the last target columns e Network File this parameter is a field required It is a file generated by the model during training phase It contains the resulting network topology as stored at the end of a training session Usually this file should not be edited or modified by users just to preserve its content as generated by the model itself The extension of such a file is usually mlp 3 6 2 Classification with ML
50. ser manual for details The following are the possible choices MSE Batch MSE Incremental CE Batch CE Incremental O O O A U N BP If left empty the default is the first one MSE Batch 3 5 TEST Use case In the use case named Test the software provides the possibility to test the MLP The user will be able to use already trained MLP models their weight configurations to execute the testing experiments In the experiment configuration there is also the Help button redirecting to a web page dedicated to support the user with deep information about all parameters and their default values We remark that all parameters labeled by an asterisk are considered required In all other cases the fields can be left empty default values are used 3 5 1 Regression with MLP Test Parameter Specifications In the case of Regression _ MLP with Test use case the help page is at the address http dame dsf unina it mlp_help html regr_test e Test Set this parameter is a field required Dataset file as input It is a file containing input and target columns It must have the same number of input and target columns as for the training input file 23 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved 3 5 2 DAta Mining amp Exploration onary LO Pro gram For example it could be the same dataset file used as
51. ssociated to single or multiple functionality domains in order to be used to perform practical scientific experiments with such techniques An overview of machine learning and functionality domains as intended in DAME Project can be found in A18 2 1 Multi Layer Perceptron The MLP architecture is one of the most typical feed forward neural network model The term feed forward is used to identify basic behavior of such neural models in which the impulse is propagated always in the same direction e g from neuron input layer towards output layer through one or more hidden layers the network brain by combining weighted sum of weights associated to all neurons except the input layer feee fy Output Units ra ba VA A VA N FA 4 ho Summation Units na Pattern Units ho Fi LA r di 4 i I Input Units Fig 1 the MLP artificial and biologic brains As easy to understand the neurons are organized in layers with proper own role The input signal simply propagated throughout the neurons of the input layer is used to stimulate next hidden and output neuron layers The output of each neuron is obtained by means of an activation function applied to the weighted sum of its inputs Different shape of this activation function can be applied from the simplest linear one up to sigmoid or softmax or a customized function ad hoc for the specific application DAMEWARE Beta Release MLP BP Model User Manual This documen
52. t contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration inmates Program Sigmoid function y 1 1 e x Fig 2 the sigmoid function This function is the most frequent in the MLP model It is characterized by its smooth step between 0 and 1 with a variable threshold But this restricted domain 0 1 is also its limitation It in fact can be used only in problems where expected outputs are numbers in this range Softmax In order to ensure that the outputs can be interpreted as posterior probabilities they must be comprised between zero and one and their sum must be equal to one This constraint also ensures that the distribution is correctly normalized In practice this is for multi class problems achieved by using a softmax activation function in the output layer The purpose of the softmax activation function is to enforce these constraints on the outputs Let the network input to each output unit be qi with 1 1 c where c is the number of categories Then the softmax output is _ exp q expla Statisticians usually call softmax a multiple logistic function Equation 3 is also known as normalized exponential function It reduces to the simple logistic function when there are only two categories Suppose you choose to set q to 0 exp g o Pa 1 1 visi exp q exp 0 1 exp q 4 The term softmax is used because this activati
53. tains proprietary information of DAME project Board All Rights Reserved 7 WARE 8 5 3 eo ANI acre e n nl After the training phase is finished or arbitrarily stopped the network should be able not only to recognize correct output for each input already used as training set but also to achieve a certain degree of generalization 1 e to give correct output for those inputs never used before to train it The degree of generalization varies as obvious depending on how good has been the learning phase This important feature is realized because the network doesn t associates a single input to the output but it discovers the relationship present behind their association After training such a neural network can be seen as a black box able to perform a particular function input output correlation whose analytical shape is a priori not known In order to gain the best training it must be as much homogeneous as possible and able to describe a great variety of samples Bigger the training set higher will be the network generalization capability Despite of these considerations it should be always taken into account that neural networks application field should be usually referred to problems where it is needed high flexibility quantitative result more than high precision qualitative results Second learning type unsupervised is basically referred to neural models able to classify cluster patterns onto several categories based on
54. their common features by submitting training inputs without related desired outputs This is not the learning case approached with the MLP architecture so it is not important to add more information in this document DAta Mining amp Exploration Program 2 1 1 The training performance evaluation criteria For the model MLP trained by Back Propagation as implemented in DAME there is possible choice between two error evaluation criteria respectively the MSE Mean Square Error between target and network output values and the Cross Entropy 2 1 1 1 The Mean Square Error Given the p th pattern in input a classical error function called sum of squares 1s l a ae E gt 2 gt 1 n Vi i Where ti is the p th desired output value and vi is the output of the corresponding neuron Due to its interpolation capabilities the MLP is one of the most widely used neural architectures 2 1 1 2 Cross Entropy for the two class case Learning in the neural networks is based on the definition of a suitable error function which is then minimized with respect to the weights and biases in the network Error functions play an important role in the use of neural networks A variety of different error functions exist For regression problems the basic goal is to model the conditional distribution of the output variables conditioned on the input variables This motivates the use of a sum of squares error function But for classification problems the su
55. u DI Who of the error layers ge F Law for updatina e hidden gt wnew untold tyo weijahts Momentum to iump over the Activation function error surface learning rate Fig 5 The typical flow structure of the Back Propagation algorithm As the algorithm s name implies the errors and therefore the learning propagate backwards from the output nodes to the inner nodes So technically speaking back propagation is used to calculate the gradient of the error of the network with respect to the network s modifiable weights This gradient is almost always then used in a simple stochastic gradient descent algorithm to find weights that minimize the error Often the term back propagation is used in a more general sense to refer to the entire procedure encompassing both the calculation of the gradient and its use in stochastic gradient descent Back propagation usually allows quick convergence on Satisfactory local minima for error in the kind of networks to which it is suited Back propagation networks are necessarily multilayer perceptrons usually with one input one hidden and one output layer In order for the hidden layer to serve any useful function multilayer networks must have non linear activation functions for the multiple layers a multilayer network using only linear activation functions is equivalent to some single layer linear network Non linear activation functions that are commonly used include the logistic functio
56. work See the user manual for details The following are the possible choices o 1 MSE Batch 2 MSE Incremental If left empty the default is the first one MSE Batch 3 7 2 Classification with MLP Full Parameter Specifications In the case of Classification_MLP with Full use case the help page is at the address http dame dsf unina it mlp_help html class_full Train Set 28 DAMEWARE Beta Release MLP BP Model User Manual This document contains proprietary information of DAME project Board All Rights Reserved DAta Mining amp Exploration Program this parameter is a field required This is the dataset file to be used as input for the learning phase of the model It typically must include both input and target columns where each row is an entire pattern or sample of data The format hence its extension must be one of the types allowed by the application ASCII FITS CSV VOTABLE More specifically take in mind the following simple rule the sum of input and output nodes MUST be equal to the total number of the columns in this file Validation Set This is the dataset file to be used as input for the validation of the learning phase of the model It typically must include both input and target columns where each row is an entire pattern or sample of data The format hence its extension must be one of the types allowed by the application ASCII FITS CSV VOTABLE If users leaves empty this pa

Multi Layer Perceptron with Back Propagation User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents