Home

ACRES 3 User Guide

1. This is the variable for which the expert system will provide predictions Optional You can also specify one or more Intermediate Variables Values for these variables will not be given directly by the end user Rules will be created for predicting them The Continue button loads the Expert System Creation frame b Expert System Creation ACRES Expert System Creation Dataset Variables BreastCance Output Variables fal Selected Variable For each intermediate variable specified and then for the output variable gt Select a variable 1 Specify a subset of variables for creating prediction rules 2 gt Add the variable as a node to the architecture Tree 3 Selected Variable Eukova 2 Main Loop Optional A second subset can be specified by checking Two Predictions o There are two alternative methods for combining these two predictions about the same conclusion The method used in MYCIN MYCIN and a generalized version using weights WEIGHTED To help the user in choosing a subset three facilities are offered o Feature Ranking 5 Automatically produced when selecting a variable o Subset Selection 4 by clicking Find Subset o Selected Subset Evaluation 6 by clicking Test m ge rr Output Variables Selected Variable 7 deg malig E uxova 3 Facilities When nodes for all intermediate and the output variables have been added the expert system can be cr
2. separate file in the form of CLIPS facts Thus for each prediction rule of the expert system there is a corresponding rule that updates the corresponding frequencies The expert system consists of two files The main expert system that remains constant contains all the rules functions and templates The secondary file contains facts that store the frequencies required in the rules for computing the certainty factors which can change during runtime if the end user provides new instances Evaluation Evaluation Metrics for Classification Problems Evaluation of a classification model is usually based on the following metrics accuracy precision sensitivity and specificity which for two classes positive and negative are defined as follows TP TN TP TP ace Pree _ sen TP FP FN IN TP FP TP FN TN spec TN FP where 7P is the number of cases classified correctly as positive FP is the number of cases that were incorrectly classified as positive TN is the number of cases correctly classified as not positive and FN is the number of cases that are incorrectly classified as not positive In case of more than two classes one can view each class as a separate binary classification problem where positive are the cases of that class whereas negative are the cases of all other classes This way one can produce a confusion matrix for each class Unlike the binary classification problem with this approach a co
3. the data of the testing set The procedure is Predicting Variable 1_class zi Classes lt 2 gt repeated for different partitions of the dataset 1 recurrence events cross validation and the average values of the metrics are presented Intermediate Variables lt 0 gt The produced expert system does not simply Certainty Factors MYCIN Cross Validation 2 classify an instance to the predicted class It Training Test Ratio 3 provide an uncertainty value for caca RNN possible class In order to make the evaluation 1 3 76 10 TM of a produced expert system Evaluation Avg ofrules 24 208 instances in Traming Set easier we consider that the Avg of covered instances 66 69 instances in Test Set system classifies an instance to the class for which the Capea FP Sensitivity Precision F Sqrt Pred i TN rate rate Recall r p Measure p r Accuracy uncertainty factor is the a highest 43 0 84 084 084 0 84 As described above to i evaluate an expert system for 0 57 0 55 0 56 0 56 more than two classes we 3510 76 0 76 076 evaluate the performance for each class separately For each class i we treat the problem as binary with the first class being i and the second class being a class consisting of all other classes We then form a confusion matrix and compute the metrics for each class i We are mostly interested in the Sensitivity and Precision metri
4. ACRES 3 User Guide Konstantinos Kovas Department of Computer Engineering and Informatics University of Patras kobas ceid upatras gr Version 3 0 2 03 3 2013 ACRES v3 alpha 2011 04 A r we Create Expert System Load Expert System a Artificial Intelligence Group ACRES Automatic Creator of Expert Systems is a tool initially developed as way to test and compare different methods of combining Certainty Factors in expert systems In its second version we extended the architecture to apply for the problem of multiclass classification but the overall architecture remained simple focusing on the goal of comparing certainty factor combination methods The third version is our attempt towards a more generalized tool for generating expert systems More specifically an extension of the system made it possible to generate classification rules for additional variables apart from the output variable for which the final user of the expert system cannot provide values This gives the ability to design more complex rule hierarchies which are represented in an easy to interpret tree structure Feature ranking and subset selection techniques help achieve the generation task in a more automatic and efficient way Other enhancements include the ability to produce expert systems that dynamically update the certainty factors in their rules the generation of rules and functions for interaction with the end user and a graphical interfac
5. cs We also combine these two metrics producing their mean proportional SQRT p r as a more general metric and the F Measure metric that we defined previously For a measure of the general classifying performance of the expert system we use the Predictive Accuracy metric which shows the percentage of instances in the testing set that were correctly classified to the class they belong 11 Z EXPERT SYSTEM INTERFACE ACRES ig cs yi ge en z AULON CRESCO ol Experts SYSLEMS Expert System Interface pert System Output Browse for an expert system file previously created by ACRES Give a value for each input variable and assert the fact to get a prediction gt D
6. e for the produced expert system a Dataset amp Variables Settings AUbOM Mock bon Ol EXPERI Systems Dataset and Variables Settings Dataset A E l 4 tumor size 3034 no recumence ev 20 24 notecumence ev 4049 20 24 notecurmenceev 15 19 Dateset Edit Delete _ S 2_age a 3_menopause 4 tumor size 6 _node caps _ 7_deg malig 8 breast ia E E E 5_inv nodes E w E Dataset Import 1 Dataset Edit 2 Variables 3 Dataset Import Dataset Name BreastCancer Variables File Browse Dataset Name Specify a name for the expert system that will be created Variables File A file containing a name for each variable in the dataset 1_ class 2 age 3 menopause 4 tumor size 5_inv nodes 6 node caps 7_deg malig 8 breast 9 breast quad 10_irradiat example variables file Dataset File The dataset file containing known instances about a problem comma delimited format no recurrence events 50 59 ge40 15 19 0 2 yes 2 left central yes no recurrence events 50 59 premeno 25 29 0 2 no 1 left left_low no no recurrence events 60 69 ge40 25 29 0 2 no 3 right left_low no recurrence events 50 59 premeno 15 19 0 2 no 2 left left_low no recurrence events 40 49 premeno 40 44 0 2 no 1 left left_low no recurrence events 50 59 ge40 35 39 0 2 no 2 left left_low no recurrence events 50 59 premeno 25 29 0 2 no 2 left right_up n
7. e probability with the a priori probability found from the general frequency of class C in the entire dataset This probability can be easily computed following the formula f Ci N C Using the definition of certainty factors in the expert system MYCIN we can combine these two probabilities to produce the measures of Belief MB C E and Disbelief MD C E MB Ci E 1 Lf PCC I P Ci E P Ci MB Ci E max 0 ere Ta d otherwise MD Ci E 1 if PCL 0 P Ci P Ci E ae otherwise MD Ci E max 0 Finally we can estimate the Certainty Factor using these measures of Belief and Disbelief creci p BCEE MD CLE 7 I min MB Ci E MD Ci E It is important to point out the underlying characteristic of this method which is that the certainty factor produced is not a measure of our confidence in C but rather a measure of the change of our confidence in C given the evidence E This means that a positive value represents an increase of our confidence whereas a negative value represents a decrease of our confidence Dynamic CFs Another new feature is the ability to generate expert systems that can update the Certainty Facts of their rules when new instances of the problem become available To accomplish this the certainty factors are not hard coded inside the generated rules but are instead dynamically computed at run time The required frequencies for computing the certainty factor are saved in a
8. e40 5_inv nodes 3 5 7_deg malig 3 gt assert 1_class no recurrence 0 64 recurrence 0 64 A simple example of a generated rule Certainty Factor Combination If we repeat the above procedure more than one time for different set of variables we can create a rule set that given a new instance can provide more than one conclusions about the output variable According to the model of certainty factors used in MYCIN two certainty factors about the same fact can be combined using suitable formulas depending on the signs of the certainty factors combined For example if we have two rules with the same conclusion and CF1 CF2 respectively the certainty factors associated with them and they are both positive numbers the combined certainty factor CF for conclusion according to MYCIN theory is given by the formula CF CF1 CF2 1 CF1 CF1 CF2 CF1 CF2 3 In the expert system PASS 4 the remark was made that in formula 3 both certainty factors contribute equally to the final result In practice rules are often not equally reliable since their certainty factors are either bound to an expert s judgment or based on data containing noise so they proposed a generalized version of the formula 1 CF wl CF1 w2 CF2 w CF1 CF2 4 where wl w2 and w are numeric weights that should satisfy the following equation wl w2 w l1 5 to assure that 0 lt CF lt 1 To use formula 2 however the wei
9. eated Create ES will create the expert system as a clips file Evaluate will create an expert system using a training set and evaluate it with a testing set j l_class aj J menopause Gc S_invw nodes e 6_node caps e 3_ reast quad Gc T_deg malig AG 7 _deg malig_1 S_inv nodes 10_irradiat T_deg maliz_2 6 _node caps Wee Rule Generation and CF estimation Given a variable for which we want predictions made and a subset of variables to be used for the prediction we can generate a set of rules from a training set with the following steps 1 Cluster instances in groups so that each group contains instances that have identical values in the variables of the subset 2 From each such group produce one rule that has as conditions the common attribute value pairs of the instances and as conclusion the possible classes of the output variable 3 Associate each possible class i with a certainty factor using the formula CF n N 1 Where n is the number of instances of class i in the group and N the number of all instances in it That is a CF for a class is defined as the frequency of the class in the group It is obvious that the certainty factor would be a value between 0 and 1 We can easily convert this value in the interval 1 1 with the formula CF 2 CF 1 2 We give a simple example of a rule created with this method defrule group_1_class_16 declare salience 70 data 3_menopause g
10. ghts wi w2 w should be first determined In PASS statistical data about the problem were used as a training data set to determine the weights by hand In ACRES we offer both combination methods when multiple rule sets are specified for the output variable The system produces the necessary weights for the generalized formula automatically utilizing a genetic algorithm to search the space of possible weight combinations for an optimum one CF Models The system offers two alternative methods for estimating Certainty Factors Consider an output variable C associated with n possible classes C and a dataset N containing INI instances Evidence is a certain pattern of values for a set of variables of the dataset and D is the set of instances in the dataset that this pattern occurs We represent the absolute frequency of class C in D as f C D and the absolute frequency of class C in N as f C N P H E Our initial approach used in previous versions relied solely on the probability found from the frequency of a class in D For a class C the certainty factor is estimated using the conditional probability that an instance is classified in class C given that evidence F is true f Ci D P Ci E DI Obviously the above value would be between O and 1 so we use the following formula to produce a value in the interval 1 1 CF Ci E 2xP CilE 1 MYCIN CFs An alternative method added in the new version combines the abov
11. o recurrence events 30 39 premeno 0 4 0 2 no 2 right central no recurrence events 50 59 premeno 25 29 0 2 no 2 left right_up no example dataset file Dataset Edit Optional After importing the variables and dataset files the dataset is imported as a grid The user can manually edit the values in the grid i 1_class 2 ag 3 menopause 4 tumorsize 5 inv nodes j es a re a 02 roweomenceey 669 o0 hs 4 i Euxova 1 Dataset as a grid Additionally the user can perform the following operations gt Delete Variable Specify a variable and the corresponding column will be deleted gt Merge Variables Specify two variables The corresponding cells will be merged The values will be separated with _ gt Merge Classes Specify a variable and two of its classes Then press Merge to merge these classes as a new one with the name specified gt Descretize Variable Choose a variable with real values Specify the number of classes and a discretization method Dateset Edit aS os m E na ka Merge ese i Save Changes The Reset button will undo all changes made and reload the dataset you initially imported The Save Changes button will save all modifications as a new dataset file You must manually edit the variables file if necessary Then import both files again to continue with the expert system creation Variables The user must specify an output prediction variable
12. rrectly classified case as negative does not necessary mean that the case was classified to the correct class For this reason the value of TN is not credible and therefore cannot be used for estimating evaluation metrics The metrics used are Precision as defined above and Recall corresponds to Sensitivity For a possible class A Precision is the fraction of instances that were classified to class A that actually belong to that class while Recall is the fraction of instances that belong to class A that were correctly classified to that class Since TP EN is the sum of all cases that truly belong to the positive class the Recall metric is also referred to as TP rate Another useful metric is the F measure combining the recall and precision values EP TP recall __ precision _measure ae TP FN TP FP precision recall 2x precision x recall Finally the weighted average of these metrics for all classes can be calculated taking into account the number of occurrences of each class in the dataset These metrics are widely used in classification performance evaluations and corresponding tools like the data mining tool Weka so using them allows the direct comparison with various classification models Evaluation Report in ACRES The dataset is partitioned in two sets training and testing set The expert system is generated using the training set and then it is evaluated BreastCancer on

ACRES 3 User Guide

Contents

Download Pdf Manuals

Related Search

Related Contents