Home

User Guide

1. SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 207 as VI To Save a Filter You can save the filter you created to be able to reuse it at a later moment without being obliged to recreate the same conditions 1 Click the button Save Filter A pop up window is displayed 2 Inthe list Data Type select the format in which you want to save the filter 3 Use the Browse button located on the right of the Folder field to select the folder or database where you want to save the filter In the Description field enter the name of the file or table in which you want to save the filter 5 Click the OK button M To Load an Existing Filter To apply a filter to the data set you can use a file created during a previous use of the data set in Infinitelnsight 1 Click the button Load Existing Filter A pop up window is displayed 2 Use the list Data Type to select the format of the filter Use the Browse button located on the right of the Folder field to select the folder or the database in which the filter is stored 4 Use the Browse button located on the right of the Description field to select the file or the table containing the filter 5 Click the OK button 6 2 1 4 Translating the Variable Categories You can translate the categories of a nominal variable save the translation or load an existing translatio
2. A B Cc D E E G 1 Kxindex class rr_class proba_rr_class bar_rr_class contrib_age contrib_workclass Eal 1 0 002628554 0 177313641 117675817 0 00014918 0004575992 sh 2 QO 0268730283 0 567965508 1 496288657 0 00604592 0004576992 4 3 QO 0 192986876 0 023777327 0457313478 0 00038689 0 002218456 5 4 0 0 027590154 0 10943217867 0946476877 000765412 0 002218456 6 5 QO 0230136439 0548656046 1521923184 0 00574755 0 002218456 Fa 6 Oo 035434714 0737949133 1 401365995 0 00092295 0 002218456 8 7 QO 0 358695984 0 00095057 0467965275 0 00550985 0 002218456 9 8 1 0188632727 0 435952127 1483268023 0 00711805 0004575992 10 9 1 0634126425 0962523282 0575068474 0 00413935 0 002218456 11 10 1 0 540619075 0 916772068 0 933868706 0 002218456 12 11 1 0 235637605 0551528811 1519493461 0 00092295 0 002218456 Gen 12 1 0 317775369 0676707566 1 426785707 0 00467542 0 004575992 14 13 QO 0 192047745 0 024259994 0463258654 0 00842789 0 002218456 15 14 QO 0 11635045 0058997665 0 731234848 0 00360329 0 002218456 16 15 1 0 164729878 0406741321 1 466872215_ 0 00068525 0002218456 2 You can now analyze the results obtained and use these results of your analysis to make the right decisions 5 2 4 2 6 1 1 Description of the Results File Depending upon which options you selected the results file will contain some or all of the following information in the same order as seen
3. A B E D 1 KxIndex class kc_clusterld kc_TargetMeanClustld 2 1 0 3 0 017524 3 2 0 5 0476401001 4 3 0 3 0 017524 5 4 0 2 0 237075999 6 5 0 5 0476401001 7 6 0 4 0 308898985 8 7 0 3 0 017524 9 8 1 5 0476401001 10 9 1 1 0942696989 11 10 1 1 0942696989 12 11 1 5 0476401001 13 12 1 5 0476401001 14 13 0 3 0 017524 15 14 0 4 0 308898985 16 15 1 2 0 237075999 17 16 0 2 0237075999 2 You can now analyze the results obtained and use these results of your analysis to make the right decisions SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 258 Depending upon which options you selected the results file will contain some or all of the following information in the same order as seen below The key variable defined during data description at the model parameter settings step If your data set did not contain a key variable the key variable Kx ndex would have been generated automatically by Infinitelnsight Possibly the target variable given as known values if the latter appeared in the application data set as is the case in this scenario The variable kc_clusterld which indicates the number of the cluster to which each observation belongs The variable kc_TargetMeanClusterld which indicates the proportion of observations belonging to the target category of the target variable that are contained i
4. In the frame Model Data Set Description Saving select where you want to save the data description The four available options are Save the Description in the Script the data description is added in the KxShell script Only one file is generated Save the Description with the Script the data description is saved in an additional file in the same folder as the KxShell script Save the Description with the Data the data description is saved in an additional file in the same folder as data used for the model Save the Description Separately the data description is saved in an additional file The user indicates the type of the description text file data base flat memory and the location where the data description should be saved Note When saving the description in an additional file the file is named following this syntax KxDesc_ lt Dataset Role gt _ lt Dataset Name gt For example for a training data set named Census csv the description file name will be KxDesc_Training_Census csv SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 183 5 Additionally you can export the variable structure with relation to a target variable by checking the option Export Variable Structure in Script and selecting the target variable in the list Select a Target This option allows you to force the grouping of categories when training
5. SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 65 2 Click the Browse button The following selection dialog box opens Select Source Folder for Data w a CJ C Wsers denise ortiz caso Documents i J Samples C Census C JapaneseData 2 5 2 ya oO ya tot H A K xl Text Files dat data csv txt v uef o Password oS Loc ce 3 Double click the Samples folder then the Census folder Note Depending on your environment the Samples folder may or may not appear directly at the root of the list of folders If you selected the default settings during the installation process you will find the Samples folder located in C Program Files KXEN KXEN InfinitelnsightV7 0 0 4 Select the file CensusOl csv then click OK The name of the file will appear in the Estimation field 5 Click the Next button The screen Data Description will appear lt KXEN InfiniteInsight New Regression Classification Model Data Description index Name Storage vae Key Order Missng Group Desorption stuctwe 7 Add Filter in Data Set pp Analyze LD open Description R Save Description y view Data Bo Eioinetmire testa eden te GEIR 6 Goto section Describing the Dat
6. To validate a segmentation model you can also observe the value of the indicators frequency and target mean for each of the identified clusters Specifically the most interesting clusters of the segmentation model will possess an elevated frequency anda target mean that deviates from the target mean of the entire data set Note that a segmentation model with a low KI may conceal precisely this type of cluster To find out how the frequency and target mean for a cluster are calculated see Understanding the Detailed Description of Clusters E For this Scenario The model generated possesses A quality indicator KI equal to 0 703 A robustness indicator KR equal to 0 987 The model performs sufficiently well You do not need to generate another VI To Validate the Model Generated 1 Verify the quality indicator KI and robustness indicator KR of the model These indicators are encircled on the following figure SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 220 Note As a general note other indicators are provided in addition to KI and KR during generation of the model For example you could view the Learning Time required to generate the model and information on the targets SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company
7. Vi To Apply the New Style Sheet to the Generated Reports 4 Inthe panel Report select the new style sheet 5 Click OK A window opens indicating that you have to restart the modeling assistant to take the edited options into account 6 Click OK When training a model all the generated reports the learn excel statistical reports are now customized SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 198 6 2 Creating a Clustering Model Using Infinitelnsight Modeler Data modeling with nfinite nsight Modeler Segmentation Clustering is subdivided into four broadly defined stages Defining the Modeling Parameters Generation and Validation of the Model Analysis and Understanding of the Analytical Results Using a Generated Model A OND 6 2 1Step 1 Defining the Modeling Parameters To respond to your business issue you want to Break down the sample of 50 000 prospects who responded to the test phase of your marketing campaign into homogeneous groups see Summary of the nfinite nsight Modeler Regression Classitication Application Scenario Describe each of these groups and provide customized communication for each of these different groups The nfinite nsight Modeler Segmentation Clustering feature allows you to create descriptive models The first step in the modeling process consists of defining the mod
8. a 0 0 One of the interest of the measure of AUC is its independence from the target distribution let us imagine that we build another data set where we duplicate each good example twice the AUC of the model will be the same Warning Area Under the Roc Curve AUC has very nice properties to evaluate a binary classification system It is widely used now by statisticians even if it is not easy to picture for non statisticians 4 8 3 Error Indicators First some basic notations Target response value V Predictor predictor response value Y Residual 7 4 V Error 4 Sly 3 FN Weight of the tested observation W n W Y w ed Total weight of the population i l N y Vid Target average W ta s id J AN Predictor average W i SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 42 4 8 3 1 Mean Absolute Error L1 Definition mean of the absolute values of the differences between predictions and actual results City block distance or Manhattan distance Formula fies N gt wu W ia 4 8 3 2 Mean Square Error L2 Definition square root of the mean of the quadratic errors Euclidian distance or root mean squared error RMSE Formula x 2 SSE X wu i l SSE 1 a Ewu i l L2 4MSE MSE 4 8 3 3 Maximum Error Linf Definition maximum absolute difference b
9. 2 SEIeBIEg Number of explanatory variables actually used by the resulting model Number of Records Number of records in the data set Building Date Date and time when the model was built Learning time Total learning time Depending on the feature used Kxen RobustRegression Kxen SmartSegmenter Kxen TimeSeries Kxen AssociationRules Kxen EventLog Kxen SequenceCoder Kxen SocialNetwork Engine name SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 110 5 2 3 2 2 Modeling Warnings Monotonic Variables Indicates if monotonic variables have been found in the data set that is variables which direction of variation Detected is constant in the reading order of the data in the estimation data set Suspicious Variables This report presents a list of variables that are considered to be suspicious These suspicious variables have a Detected KI gt 0 9 they are very correlated to the target variable This means these variables probably bring a biased information and should not be used for the modeling A special attention should be taken towards those variables A more detailed report lists which variables exactly are suspicious and at which extent see Statistical Reports gt Expert Debriefing gt Suspicious Variables 5 2 3 2 3 Targets For each nominal variable lt Name gt Name of the target variable Target
10. 21 22 23 24 25 26 27 Click the OK button To Save the Categories Translation Translate the variable categories as described above Click the Save button Choose a Data Type Select a Folder Enter a Name for the file or table Click the OK button To Load an Existing Translation File Right click a nominal variable A contextual menu is displayed Select the option Translate Categories for lt name_of_the_variable gt Click the Load button Select the format of the translation in the list Data Type Use the Browse button located on the right of the Folder field to select the folder or the database in which the description is stored Use the Browse button located on the right of the field Table or File to select the file or the table containing the description Click the OK button H Click the button Update to refresh the display of the categories If the list of columns is not named correctly use the Advanced Settings see next paragraph to set a header line and update again Map the language names with those from the loaded translation by clicking the categories and choosing the corresponding language in the contextual menu Click the OK button To Set a Header Line Click the tab Header Line Check the option Force Header Line In the field Line enter the number of the line you want to use as header line Click OK 5 2 1 5 Selecting Variables Once the training data set and its descr
11. 83311 Bachelors Married civ Exec Private 215646 HS grad Divorced Handle Private 234721 11th Married civ Handle Private 338409 Bachelors Married civ Prof sj Private 284582 Masters Married civ Exec r Private 160187 9th Married spo Other Self emp no 209642 HS grad Married civ Exec Private 45781 Masters Never married Prof s Private 159449 Bachelors Married civ Exec r Private 280464 Some college 10 Married civ Exec r State gov 141297 Bachelors 13 Married civ Prof s Private 122272 Bachelors 13 Never married Adm c Private 205019 Assoc acdm 12 Never married Sales Private 121772 Assoc voc 11 Married civ Craft4 Private 245487 7th 8th Married civ Transp Self emp no 176756 HS grad Never married Farmir Private 186824 HS grad Never married Machir Private 28887 11th Married civ Sales Self emp no 292175 Masters Divorced Exec Private 193524 Doctorate Married civ Prof sj Private 302146 HS grad Separated Other Federal gov 76845 9th Married civ Farmir Private 117037 11th Married civ Transp Private 109015 HS grad Divorced Tech s E First Row Index Last Row Index wa E al 2 Inthe field First Row Index enter the number of the first row you want to display 3 Inthe fieldLast Row Index enter the number of the last row you want to display 4 Click the
12. Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 24 4 5 2 Synonyms of Observation and Variable Depending upon your profile and your area of expertise you may be more familiar with other terms that refer to observations in rows and variables in columns when using tables of data The following table presents such terms or synonyms Terms equivalent to Terms equivalent to the term Observation the term Variable Row Column Record Attribute Table Field Event Property Instance Example 4 5 3 Data Formats Whatever the data source used the following two constraints must be accommodated The data must be represented in the form of a single table except in instances where you are using the nfinite lnsight Explorer Event Logging or Infinitelnsight Explorer Sequence Coding features The target variable must be defined for each observation in the table In the sample file CensusOl csv the variable class has been be defined for each individual Note For information about data formatting and specifically for the list of supported ODBC compatible sources see the document Data Modeling Specification 4 6 Variables 4 6 1Generic Definition A variable corresponds to an attribute which describes the observations stored in your database In Infinite nsight features a variable is defined by Type Storage format Role SAP Infinitelnsight 6 5 SP4 CUSTOMER Es
13. Nationality and so on You note that the database you have at hand is not ideal In fact the database contains Incongruous data Redundant data Missing data 5 1 3 2 1 Incongruent Data The database contains alphanumeric information such as occupation and nationality as well as numerical information such as age and unreconciled accounts 5 1 3 2 2 Redundant Data Some information in the database is redundant such as degree and education or degree and area of work In the field of statistics the term correlated variables is used to designate such data In classical statistical analyses correlated variables must be processed in a particular manner An alternate solution is to designate only one of the two correlated variables for analysis Since you have neither the statistical skills not the means to handle this issue of correlation between variables you decide to leave the database as it is SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 54 5 1 3 2 3 Missing Data Some information is missing from the database To manage this lack of information the Information Technology department used the following convention The symbol means that an alphanumeric value such as occupation is missing The value 99999 means that a numerical value such as age is missing Unfo
14. by building a new structure from scratch The option Enable the target based optimal grouping performed by nfinite nsight Modeler Data Encoding allows you to let Data Encoding group together the categories groups defined in the variable structure if they bring the same information For more details on variable structure see nfinite nsight Modeler Regression Classification gt Defining a Variable Structure SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 205 6 2 1 3 Filtering the Data Set In order to accelerate the learn process and to optimize the resulting model you can apply a filter to your data set mg For this scenario Do not use the filtering option To Filter a a Set 1 Check the option Add a Filter in Data Set 2 Click Next d a Condition 3 Click the button Add Condition The window Define a Condition opens x Loe _ oe Choose a variable in the first list Choose an operator in the second list Indicate a value in the third list Fora variable with number storage type a value ou e SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 206 For a variable with string storage choose a variable in the list If the list is empty click the button k to extract the var
15. e Gn umn The SQL expression can be broken down as follow the first part 1 defines a cluster of observations where the variables equal the values displayed the second part 2 defines clusters of observations that are excluded for the cluster found in part 1 The percentages displayed indicate the proportion of each cluster excluded with respect to the cluster found in part 1 In our example the first excluded cluster corresponds to observations where the capital gain variable has its value ranges between 4650 excluded and 99999 4650 99999 It represents 1 59 of the observations found in part 1 Note that the clusters are created by applying the SQL expressions in a specific order defined by the engine If you apply the SQL rules randomly you may not obtain exactly the same result SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 248 6 2 3 6 6 Difference Between Standard Cross Statistics and SQL Expressions When you ask for SQL expressions the final segmentation is different from the one without The goal of SQL is to have easy to understand and easy to apply segments SQL expressions are built to describe as much as possible the basic segments that is the ones you get when you do not ask for SQL The SQL can be used both to have a better definition understanding of the clusters and to deploy them on the full
16. the model has been computed while at least one physical key variable was defined in nfinite nsight there is a valid nfinite nsight Scorer license for the database pno error has occurred the in database apply mode is not deactivated granted access to read and write create table M To Use the In database Apply Mode Check the option Use the Direct Apply in the Database KXEN InfiniteInsight _kxenodbc Poy Applying the Model Application Data Set Data Type Pata Base eal Folder kxenDemo O WT Browse Data enodccensss O O eR rowel Generation Options Generate Predicted Value Only z Advanced Apply Settings Mode apply zl JV Use direct apply in the database Results Generated by the Model Data Type Database E Folder KrenDemo E Browse Data dbo todel_idba_K25 IPJ E Define Mapping 6 2 4 1 3 Advanced Apply Settings This option allows you to add to the output file the weight variable if it had been set during the variable selection of the model SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 253 This option allows you to add to the output file one or more variables from the data set M To Add All the Variables Check the All option a eon R To Select only Specific Variables Check the Individual option Click the
17. Black Sex Gender Male Female capital gain Annual capital gains Any numerical value capital loss Annual capital losses Any numerical value native Country of origin United States country France class Variable indicating whether or not the 1 if the individual has a salary of greater than 50 000 salary of the individual is greater or less than 50 000 0 if the individual has a salary of less than 50 000 Note In order to avoid complicating the nfiniteInsight Modeler Regression Classification and InfiniteInsight Modeler Segmentation Clustering application scenarios the variable fniwgt is used as a regular explanatory variable in these scenarios and not as a weight variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 59 5 1 8 Infinitelnsight To accomplish the scenario you will use the Java based graphical interface of nfinite nsight This interface allows you to select the nfinite nsight feature with which you will work and help you at all stages of the modeling process M To Start InfiniteInsight 1 Select Start gt Programs gt KXEN nfinite nsight gt Infinitelnsight Infinitelnsight screen will appear KXEN InfiniteInsight X InfiniteInsight Version 6 1 0 Explorer Modeler Create or Edit Explorer Objects Create a Classification Regression Model Cr
18. CUSTOMER End User Documentation 2013 11 19 Document Version 1 0 1 How to Use this Document 2 cceeeecceseeeceenseeeeesnnneeenseeeeeenseeeensnaeeeensnneeesnaneeessnaaaeenseaeeesnaneeensnaeaeeeseanenensneees 4 11 Organization of this DOCUMENT eececeeceeeeeeeeeeeeeeeceeceeeeececeeeseecesesesnesiesieseecesiecaeseeaesesiesesesesiesiesneeneeaeeas 4 1 2 Which Sections should YOU REA oo eececeeeeteeeeeeeeeeneeeeeeeenecneeneseecneseesneceesesesneseesneseecieeesesiesiesiesesesieeneeneeaeeas 5 13 Conventions Used inthis DOCUMeNt 2202 snake aa heen et i eee a Meek 6 2 Welcome to this Guide sis2ccsecscsccdesdencczsntccoccesacecsesencececcneccessdnstsssecnedessecectssncctasessties serscsuesnacedes siuccssaccesessaccescsse 7 2 1 About this Document cecceecceseesceesceesecseeeseeesecaeceseeaecseeeaeceseceeeeaecesecaeeesaeseaeceeesseeeseseeecaeseseceeesaeseseseeeseseseseeeeaeens 7 2 11 Who Should Read this DOCUMENT ecceecccecceseeeeeeeeeeseeeeeseeeseeeeesseceaeeseeeaeeesesseesseceseeeseesaeseseseeeseseaeenees 7 2 1 2 Prerequisites for Use of this DOCUMENT eeeeccceteeteeteceeeteeeeeeeceeetetsecseeetseteseteseseseeeaetaesaeeaeeaes 7 2 1 3 What this Document COVES ic c cecccecoes codicecscececechdecevsceecasececsdesedecesscedscessvechdevedusessdhusebubceut cut cudhevateueh ducts 8 2 2 Before Beginning tiorina neend waka hahdhdedadhdbandat oadhad dniideeadddebdie baned deed 8 2 2 1 Files and Documentation Provided with this
19. For date or datetime variables Temporal Represents Generated Variable Name Information Day of week the day of week according to the ISO disposition Monday 0 lt OriginalVariableName gt _DoW and Sunday 6 Day of month the day of month 1 to 31 lt OriginalVariableName gt _DoM Day of year the day of the current year 1 to 366 lt OriginalVariableName gt _DoY Month of quarter the month of the quarter January April July and lt OriginalVariableName gt _MoQ October 1 February May August and November 2 March June September and December 3 Month of year the month 1 to 12 lt OriginalVariableName gt _M Year the year lt OriginalVariableName gt _Y Quarter the quarter of the year January to March 1 April to lt OriginalVariableName gt _Q June 2 July to September 3 October to December 4 From datetime variables Temporal Information Represents Generated Variable Name Hour the hour lt OriginalVariableName gt _H Minute the minute lt OriginalVariableName gt _Mi Second the second lt OriginalVariableName gt _S u seconds the micro second lt OriginalVariableName gt _mu The generated variables will appear in the model debriefing panels listing variables such as the Contributions by Variable the Category Significance the Statistical Reports as well as in the automatic variable selection feature 4 6 5 Roles of Variables In data modeling variables may have three roles They may be Target var
20. The Variables pull down menu allows the selection and graphing of any of the variables in the model The tool bar located under the title allows the user to copy the coordinates to the clipboard print the plot or save it in PNG format The values are normalized and their sum always equals to 0 Depending on the chosen profit strategy or on the continuous target variables value type you can obtain all positive importances or negative and positive importances The X axis shows the influence of the variable categories on the target The significance of the different numbers on the X axis are detailed in the following table Number on the X axis Indicates that the category has positive number a positive influence on the target O no influence on the target the behavior is the same as the average behavior of the whole population negative number a negative influence on the target The Y axis displays the variable categories Categories sharing the same effect on the target variable are grouped They appear as follow CATEGORY_A CATEGORY_B CATEGORY_C Categories not containing sufficient numbers to provide robust information are grouped in the KXOTHER category When a variable is associated with too many missing values the missing values are grouped in the KXMISSING category Both categories are created automatically by nfinite nsight SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Cluster
21. select the variable to be excluded w lt KXEN InfiniteInsight New Regression Classification Model Selecting Variables Explanatory Variables Selected Target Variables gt dass lt I Alphabetic Sort Weight Variable gt Exduded Variables KxIndex Number of Variables 14 q5 ey 7 Alphabetic Sort H I Alphabetic Sort Gr umn Gm Note On the screen Selecting Variables variables are presented in the same order as that in which they appear in the table of data To sort them alphabetically select the option Alphabetic sort presented beneath each of the variables list 2 Click the button gt located on the left of the screen section Variables excluded lower right hand side The variable moves to the screen section Variables excluded Also select a variable in the screen section Variables excluded and click the button lt to move the variables back to the screen section Explanatory variables selected 3 Click the Next button The screen Summary of the Modeling Parameters will appear 4 Goto the section Checking Modeling Parameters SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 212 UC SSS SSS SSS eee 6 2 1 6 Checking Modeling Parameters The screen Summary of Modeling Parameters allows you to check the modeling parameters just before generating the model KXEN InfiniteInsight
22. 97 116 101 100 0 Separated static char KxVar0Cat4 8 87 105 100 111 119 101 100 0 Widowed static char KxVar0Cat5 22 77 97 114 114 105 101 100 45 115 112 111 117 115 101 45 97 98 115 101 110 116 0 static char KxVar0Cat6 18 77 97 114 114 105 101 100 45 65 70 45 115 112 111 117 115 101 0 Married AF spouse louble Kxen_RobustRegression_0_KxVar 1 char iValue if 0 strcmp WValue return double 0 2086 1240027547073 if 0 stromp KxVar0Cat1 iValue return double 0 19272502783788772 if 0 stremp KxVar0Cat2 iValue return double 0 14285498 191285428 if 0 stremp KxVar0Cat3 iValue return double 0 19272502783788772 if 0 stremp KxVar0Cat4 iValue return double 0 14285498 191285428 if 0 stremnf KxVarNCat5 Value gt SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 180 5 2 4 5 1 List of Generated Codes The following table lists the available codes with their particularities Generated Code Comment AWK Code C Code see Kxen C Code Generator documentation PMML 3 0 PMML 3 1 PMML 3 2 Cpp DB2 UDF SQL HTML Javascript contains a form to fill which reproduces the Kxen model JAVA Code needs the Kx RT jar package to run Oracle UDF SQL PMML2 SAS Code
23. All rights reserved 221 2 You can also verify the indicators in the Detailed Log click the Show Detailed Log button The following screen appears Quality KI 0 758679 Robustness KR 0 996649 3 You can then display the screen Using the Model a If the performance of the model satisfies you go to Step 3 Analyzing and Understanding the Model Generated on page 223 b Otherwise go to the procedure To Generate a New Model MV To Generate a New Model You have two options On the screen Training the Model you can Either click the Previous button to return to the modeling parameters defined initially Then you can modify the parameters one by one Or click the Cancel button to return to the main screen of Modeling Assistant Then you must redefine all the modeling parameters SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 222 6 2 3 Step 3 Analyzing and Understanding the Model Generated IN THIS CHAPTER VEE EE EEEE EEE E E hapactgcalanees cniutend aameds i ohanaayedavicha adandoeeeedh 223 Model OVErVieW ccecceeseecesceeeneeeeaeeseaeeseaeeseaeeeeseeseaeeeeaaeseaeeeeaeeesaeecaeeseaeeseaeeseaeeseseeseaeeseseeseaeeseaeeseaeeseeeseaeeteae 224 MOGe li Gireaplns EA E A cca hc E R este dovncean tah sctedlsUeccdccehcseadeissace techestdupaqunecs duaentecenscegesetnaracaassnaasssiatatassesansay 226 Categ
24. However the user has the possibility to move a key variable in the Explanatory Variables Selected if he wants this variable to have this role SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 92 3 Click the Next button The screen Parameters of the Model will appear 5 2 1 6 Checking Modeling Parameters The screen Summary of Modeling Parameters allows you to check the modeling parameters just before generating the model KXEN InfiniteInsight class_Census01 Summary of Modeling Parameters Compute Decision Tree Autosave Export KxShell Script Gia Note The screen Summary of Modeling Parameters contains an Advanced button By clicking this button you access the screen Specific Parameters of the Model For more information about these parameters Setting the Advanced Parameters on page 95 The Model Name is filled automatically It corresponds to the name of the target variable CLASS for this scenario followed by the underscore sign _ and the name of the data source minus its file extension CENSUS01 for this Scenario You have the possibility to display the results generated by K2R as a decision tree based on the five most contributive variables To activate this option check the box Compute Decision Tree The Autosave button allows you to activate the feature that will automatic
25. Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 182 5 2 4 6 Exporting the Model as a KxShell Script The KxShell script export allows you to generate a KxShell script reproducing the current model This script can be used to run models in batches One easy way to get special settings in exported KxShell scripts is to first do the corresponding operation in the graphical user interface For example if you run an auto selection of variables before exporting the shell script then the exported script will include the code needed to do the auto reduction VI To Save the KxShell Script 1 Inthe section Save Export of the menu Using the Model select the option Export KxShell Script The panel KxShell Script Generation is displayed KXEN InfiniteInsight class_Census01 wy KxShell Script Generation KxShell Script Saving Location Folder Woes SSCS Oi roe File ks Browse Model Data Set Description Saving Save the Descriptions in the Script Save the Descriptions with the Script Save the Descriptions with the Data Save the Descriptions Separately J Generate Variable Structure From Statistics F Select a Target Eass 7 Q Script Preview pl a ae Use the Browse button located to the right of the Folder field to select where the script will be saved 3 Inthe field KxShell Script enter the name of the file in which the script will be saved
26. Private Private Self emp no Private 193524 Private 302146 rivate 117037 rivate 1090 wats Local gov 216851 180211 Some college Private 367260 HS grad Private 193366 HS grad Private 386940 Bachelors Private 242406 11th Self emp no 265477 Assoc acdm Self emp no 88506 Bachelors Private 94638 HS grad 57 Federal gov 337895 Bachelors 53 Private 14436 1 HS grad 44 Private 128354 Masters SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 87 as VI To Save a Filter You can save the filter you created to be able to reuse it at a later moment without being obliged to recreate the same conditions 1 Click the button Save Filter A pop up window is displayed 2 Inthe list Data Type select the format in which you want to save the filter 3 Use the Browse button located on the right of the Folder field to select the folder or database where you want to save the filter 4 Inthe Description field enter the name of the file or table in which you want to save the filter 5 Click the OK button M To Load an Existing Filter To apply a filter to the data set you can use a file created during a previous use of the data set in Infinitelnsight 1 Click the button Load Existing Filter A pop up window is displayed 2 Use the l
27. Reference Data Set Use a File or a Database Table Use Explorer Data Type Text Files Folder Samples J Browse Data Set _ amp E Browse of Cutting Strategy v sofea Samples KTC Text Files E GEIR SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 200 2 Click the Browse button The following selection dialog box opens Select Source Folder for Data w a CJ C Wsers denise ortiz caso Documents i J Samples C Census C JapaneseData 2 5 2 ya oO ya tot H A K xl Text Files dat data csv txt v uef o Password oS Loc ce 3 Double click the Samples folder then the Census folder Note Depending on your environment the Samples folder may or may not appear directly at the root of the list of folders If you selected the default settings during the installation process you will find the Samples folder located in C Program Files KXEN KXEN InfinitelnsightV7 0 0 4 Select the file CensusOl csv then click OK The name of the file will appear in the Estimation field 5 Click the Next button The screen Data Description will appear lt KXEN InfiniteInsight New Regression Classification Model Data Description in
28. The Progress Bar displays the progression for each step of the process It is the screen displayed by default The Detailed Log displays the details of each step of the process MI To display the Progression Bar Click the Show Progression button The progression bar screen appears M To Display the Detailed Log Click the Show Detailed Log button The following screen appears Training the Model a o dB Computing statistics Statistics for discrete target class O is found 27667 times 1 is found 8714 times O is found 9488 times 1 is found 2973 times On Estimation On Estimation On Validation On Validation ariable workclass compression on estimation from 9 to 8 categories ariable workclass compression on validation from 9 to 8 categories ariable native country ariable native country individual individual individual individual individual individual individual individual individual individual variables variables variables variables variables variables variables variables variables variables Learning time 2 seconds Consistent Coder engine learn finished Number of input 20 order 1 target key 1 Number of extended variables Computing statistics Computing statistics compression on estimation from 42 to 17 categories compression on validation from 41 to 17 categories relationship 0 552262 Validation 0 558961 Estimation Marital status 0 5
29. The size of the bubble is plotted according to the frequency of the corresponding cluster C 7 1 1 1 14 calendar table A calendar table is used to ease the development of solutions around any business model which involves dates A common practice is to have a calendar table pre populated with some or all of the needed information enabling to accomplish most date related complex tasks with simple database queries 7 1 1 1 15 category A category is one of the possible values of a discrete variable A discrete variable is a nominal or ordinal variable It is the basic element used to code the variable as well as to gather descriptive statistics SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 262 7 1 1 1 16 category significance The category significance measures the impact acategory has on the target 7 1 1 1 17 centroid Imaginary point inside a polygon whose coordinates are generally those of the polygon center 7 1 1 1 18 chunk by chunk Number of lines of a table that are processed as package 7 1 1 1 19 classification rate ratio between the number of correctly classified records and the total number of records 7 1 1 1 20 confidence The Confidence of a rule is a measure that indicates the percentage of sessions verifying the consequent among those verifying the antecedent For instance the number of sessions containing the item D among the ones contain
30. class_Census01 Summary of Modeling Parameters Model Type kxenSmartSeamenter Data to be Modeled Samples Census Census01esv gt Cutting Strategy Random withouttest aaea Data Description jf Samples Census DeseCensus0Lcsv Target Variable Weight Variable Optional NOME Find the best number of dusters in thisrange j0 0 1 Calculate SQL Expressions Y Autosave Export KxShell Script an a Note The screen Summary of Modeling Parameters contains an Advanced button By clicking this button you access the screen Specific Parameters of the Model For more information about these parameters Setting Up the Advanced Options on page 214 The Model Name is filled automatically It corresponds to the name of the target variable CLASS for this scenario followed by the underscore sign _ and the name of the data source minus its file extension CENSUS01 for this Scenario Before generating the model you can define the number of clusters that you want to obtain These fields allow you to specify how many clusters will be generated by the model By default the number of clusters is set to 10 The higher the number of segments the lower the robustness KR The lower the number of segments the lower the information KI One should generally start with the default number and then go further with more or less clusters based on the results For supervised segmentation that is to say with
31. generated in the programming language The nfinite nsight Scorer feature which is responsible for generation of this code is described below 3 2 2 4 1 The Infinitelnsight Scorer Feature The nfinite nsight Scorerfeature formerly known as KMX generates code in the following languages C XML AWK HTML SQL PMML2 SAS or JAVA corresponding to a model generated by nfinite nsight In this form the model may be integrated into any application that supports the aforementioned languages The generated codes allow the nfinite nsight models to be integrated within any given application or software package or to be applied directly to the data without requiring nfinite nsight environment Warning Code generation is only available for models using the following features nfinitelnsight Modeler Data Encoding Infinitelnsight Modeler Regression Classification Infinitelnsight Modeler Segmentation Clustering 3 3 Methodological Prerequisites Before modeling your data using the nfinite nsight you should State a business issue that you want to solve Possess a data set representing this issue in the form of a set of observations 3 3 1 What is your Business Issue All of the nfinite nsight features are a response to the same requirement they allow supervised data analysis The term supervised means that the data analysis does not occur completely independently but always as a function of a particular issue
32. gt gt button to display the variable selection table In the Available list select the variables you want to add use the Ctrl key to select more than one variable Click the gt button to add the selected variables to the Selected list This option allows you to add to the output file constants such as the apply date the data set name or any other information useful for using the output file A user defined constant is made of the following information Parameter Visibility Name Storage Value Key Description indicates if the constant will appear in the output or not the name of the user defined constant the constant type number string integer date the value of the constant indicates if the constant is a key variable or identifier for the record You can declare multiple keys They will be built according to the indicated order 1 2 3 VI To Define a Constant ah wD In the list Output Storage select the constant type In the field Output Value enter the constant value Value Warnings checked the constant appears in the output unchecked the constant does not appear in the output 1 The name cannot be the same as the name of an existing variable of the reference data set 2 Ifthe name is the same as an already existing user defined constant the new constant will replace the previous one number string integer date datetime date format yyyy MM DD dateti
33. kxen RobustRegression x E Category Significance Variable Reason for Exclusion E Continuous Variables MyConstant E Nominal Targets Real_estate a Data Set Size Sport Variables Correlations car Cross Statistics with the Target s pool Grouped Cross Statistics with the T Model Performance Control for Deviations amp Expert Debriefing E Groups Id E Other Variables Performance Indic E Continuous Encoding Variables Exdusion Cause EEO veral Exc s Model Settings 5 SS SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 134 Target Specific Exclusions showing the variables excluded towards a particular target KXEN InfiniteInsight class_census_apply2 ally Statistical Reports lt 7 Statistical Reports AlE gt f B Descriptive Statistics aB EJE saa B variables K2R Engine afd an Target capitaHoss b E Continuous Variaties E Continuous Targets Number c_capital gain Small KI On Estimation E Data Set Size c_fniwgt Small KI On Estimation B Yariables Correlations capital gain Small KI On Estimation H Cross Statistics with the Target s Small KI On Estimation H Grouped Cross Statistics with the T i Small KI On Validation H Model Performance i Small KI On Val
34. n QA 13 To Create a New Category In the field right of the button New Category enter the name of the category to add Click the button New Category The category is created in the list Category Edition To Add Categories to a Group In the list Category Edition select the category or categories to add to a group In the list Group Structure select the group in which you want to add the selected categories Click the button Add Category To Delete a Group In the list Group Structure select the group to delete Click the button Remove Group All the categories belonging to this group are re added to the list Category Edition To Remove a Category from a Group In the list Group Structure select the category or categories you want to remove from the group Click the button Remove Category The selected categories are removed from the group and re added to the list Category Edition 5 2 1 2 7 Working Without any Defined Structure If you let the structure as undefined nfinite nsight using consistent coder automatically determines the categories grouping depending on their interaction with the target variable You can configure two parameters in this case the band count for continuous variables nfinitelnsight Modeler Data Encoding optimal grouping for all variables SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights
35. your business issue Consider the database that contains information about your customers An analysis that grouped your customers into homogeneous groups independently of your input would be of little interest On the other hand an analysis that grouped them as a function of a variable such as mean business revenues earned from this customer each year would offer significant interest You would learn the characteristic profiles of the customers that bring you the most money Then you can develop strategies to better influence your customers according to their characteristic profiles To recap the prerequisite step before using nfinite nsight consists of identifying and formulating your business issue SAP Infinitelnsight 6 5 SP4 CUSTOMER SAP Infinitelnsight 2013 SAP AG or an SAP affiliate company All rights reserved 14 3 3 2 Is your Data Usable Once your business issue has been identified and formulated you need to have data on hand that will permit an answer to be found We will not expound at length about the information value associated with data This depends on your data collection and extraction processes and tools and not nfinite nsight features On the other hand in order for your data to be usable by nfinite nsight the following five conditions must be met You must have a sufficiently large volume of data to be able to build a valid model that is in order for the model to be both relevant and robust An ana
36. 2013 SAP AG or an SAP affiliate company All rights reserved 282 www sap com contactsap 2013 SAP AG or an SAP affiliate company All rights reserved No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG The information contained herein may be changed without prior notice Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors National product specifications may vary These materials are provided by SAP AG and its affiliated companies SAP Group for informational purposes only without representation or warranty of any kind and SAP Group shall not be liable for errors or omissions with respect to the materials The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services if any Nothing herein should be construed as constituting an additional warranty SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries Please see for additional trademark information a
37. 227 M To Save the Model Graph 1 Click the LJ Save button A dialog box will appear allowing you to select the file properties 2 Typeaname for your file 3 Select the destination folder 4 Click OK The plot is saved as a PNG formatted image M To Print the Model Graph gt 5 Clickthe Print button situated under the title A dialog box will appear allowing you to select the printer to use Select the printer to use and set other print properties if need be Click OK The report will be printed yO MI To Export the Model Graph to Microsoft Excel Click the Export to Excel Format button situated under the title An Excel sheet opens containing the model graph you are currently viewing along with its data A B G D E F G H I J K 1 Performance percentage 2 EM percentage Random Wizard Validation 4 0 00 0 00 0 00 0 00 5 0 05 0 05 0 21 0 20 6 0 10 0 10 0 42 0 36 7 0 15 0 15 0 63 0 50 8 0 20 0 20 0 84 0 60 9 0 25 0 25 1 00 0 70 10 0 30 0 30 1 00 Performance T 0 35 0 35 1 00 I 12 0 40 0 40 1 00 H 13 0 45 0 45 1 00 14 0 50 0 50 1 00 j 15 0 55 0 55 1 00 H 16 0 60 0 60 1 00 i 17 0 65 0 65 1 00 H 18 0 70 0 70 1 00 19 0 75 0 75 1 00 0 00 t i t i 20 0 80 0 80 1 00 0 00 0 20 040 0 60 0 80 1 00 1 20 21 0 85 0 85 1 00 percentage 22 0 90 0 90 1 00 23 0 95 0 95 1 00 Random Wizard Validation 24 1 00 1 00 1 00 25 4 gt h KxReport0 Sheet
38. 3 5 1 Distance to Clusters This option allows you to add to the output file the distance of each observation from the clusters The distances are generated in the columns named kc_dist_cluster_ lt TargetVariable gt _ lt Clusterld gt For example if the target variable is Age the distance from cluster 1 will appear in the column kc_dist_cluster_Age_1 Vi To Add the Distances from All Clusters Check the All option To Select Distances from Specific Clusters Check the Individual option Click the gt gt button to display the cluster selection table Check the clusters for which you want to add the distance on i Note When the SQL mode is activated the notion of nearest cluster does not exist If a case belongs to a cluster distance is set to 0 If a case does not belong to a cluster distance is set to 1 6 2 4 1 3 5 2 Probability for Clusters This option allows you to add to the output file the probability of each observation to belong to the various clusters The probabilities are generated in the columns kc_proba_cluster_ lt TargetVariable gt _ lt Clusterld gt For exmaple if the target variable is Age the probability that the observation belongs to cluster 1 will be displayed in the column kc_dist_cluster_Age_l VI To Add the Probabilities for All Clusters Check the All option M To Select the Probabilities for Specific Clusters 1 Check the Individual option 2 Click the gt gt button to display the cl
39. 42 to 17 categories compression on validation from 41 to 17 categories ariable native country ariable native country individual variables individual variables individual variables individual variables individual variables individual variables individual variables individual variables individual variables individual variables Learning time 2 seconds AA AAR AAAAA relationship 0 552262 Validation 0 558961 Estimation Marital status 0 530936 Validation 0 542226 Estimation occupation 0 456842 Validation 0 453326 Estimation education 0 435558 Validation 0 431178 Estimation education num 0 435558 Validation 0 431178 Estimation age 0 412714 Validation 0 419249 Estimation hours per week 0 34001 Validation 0 349971 Estimation sex 0 229864 Validation 0 239272 Estimation capital gain 0 173681 Validation 0 186205 Estimation workclass 0 166991 Validation 0 159844 Estimation Consistent Coder engine learn finished Number of input 20 order 1 target key 1 Number of extended variables 21 Computing statistics Computing statistics MI To Stop the Learning Process Sas 1 Click the a Stop Learning Process button 2 Click the Previous button The screen Summary of Modeling Parameters appears 3 Go back to the section Check Modeling Parameters 6 2 2 3 Validating the Model Once the model has been generated you must verify its validity
40. 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 127 5 2 3 5 4 Understanding the Plots of Variables E For this Scenario Select the variable marital status which is the explanatory variable that contributes the most to the target variable Class all Category Significance BAG Ba ASUA Variables martalstaus S Variable marital status Influence on Target a gi eo yw e i X N s Aa S p Categories m Validation ihe This plot presents the effect of the categories of the marital status variable on the target variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 128 5 2 3 5 5 Variable Categories and Profit The plot Category Significance illustrates the relative significance of the different categories of a given variable with respect to the target variable On this type of plot The higher on the screen one finds a category the greater the positive effect on the target category or hoped for value of the target variable In other words the higher a category appears on the screen the more representative that category is of the target category of the target variable The width and direction of the bar correspond to the profit contributed by that category In other words they cor
41. 6 18 0 3 7 3 5 5 3 11 3 17 0 4 7 2 3 4 5 6 7 8 9 10 Reset Variable ranges for Cluster 1 AND relationship in Wife Husband accupation in Adm clerical Machine op inspct Protective serv Tech support Farming fishing Tr hours per week in 1 50 50 54 60 99 NOT Gra Gx 2 Inthe table select the cluster for which you want to view the profile Note lf only the variable ranges for a specific cluster are displayed click the black horizontal bar a moving cursor is displayed Drag the cursor down to display the list of clusters 3 Above the table from the drop down list associated with the Variable field select the variable for which you want to see the profile The cross Statistics will appear in the form of a plot in the lower part of the screen SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 243 6 2 3 6 3 Understanding Clusters Profiles The screen Cluster Profiles can be broken down into three parts n the upper part a drop down list allows you to select the variable for which you want to see the cross statistics Variables are presented in descending order of the significance of their contribution relative to the target category of the target variable When a cluster is selected the
42. 7 Inthe list Data Type select the type of store the model is saved in 8 Use the Browse button located next to the Folder field to select the folder or database containing the model 9 Inthe displayed models list select the model from which you want to extract the variable structure 10 Click the OK button 11 Inthe list Target from Loaded Model select the target of the model The variables you have selected are displayed in a list with the corresponding variables from the loaded model x Target from Loaded Model Variables from Training Census01 csv age X Variables from Loaded Model K2R_Census_331_1 jage X a Add Q View Census01 csv K2R_Census_331 Version 1 marital status marital status You can add or remove variables from this listand view the model variables structure as explained below 12 Once all the variables for which you want to import the structure from the model are displayed in the list click the OK button The selection window closes and the structure state changes M To Add a Variable to the List of Variables 13 Inthe list Variable from Loaded Model select the variable you want to add to the list of variables for which the structure will be imported 14 Click the Add button K2R_Census_331_1 x Target from Loaded Model Variables from Training Census01 csv fege X Variables from Loaded Model K2R_Census_331_1 fage X amp View Census01 csv K2R_Census_331 Version 1 marital
43. Category Significance 6 2 3 4 1 Definition The Significance of Categories plot illustrates the relative significance of the different categories of a given variable with respect to the target variable 6 2 3 4 2 Displaying the Significance of Categories Plot MI To Display the Significance of Categories Plot 1 Onthe screen Using the Model click Category Significance The plot Category Significance will appear ally Category Significance BA0 Bw Asdas Variables fage Variable age Influence on Target a A a D Categories m Validation Tuka 2 Inthe Variables list located above the plot select the variable for which you want to display the categories If your data set contains date or datetime variables automatically generated variables can appear in the Variables list For more information refer to section Date and Datetime Variables Automatically Generated Variables on page 30 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 231 Notes You can display the relative significance of the categories of a variable directly from the plot Contributions by Variables On the plot Contributions by Variables double click the bar of the variable which interests you In case no user structure has been defined for a continuous variable the plot category significance displays the categories creat
44. Infinitelnsight 6 5 SP4 CUSTOMER SAP Infinitelnsight 2013 SAP AG or an SAP affiliate company All rights reserved 12 3 2 2 2 Phase 2 Data Manipulation and Preparation The nfinitelnsight Explorer Sequence Codingand Intinitelnsight Explorer Event Logging features are data manipulation and preparation features They are used to encode data in a robust and semi automatic manner making them available for use by all analytical features of the nfinite nsight The use of these features is transparent for the final user all data processing is performed in a completely automatic manner Intinitelnsight Explorer Event Logging formerly known as KEL aggregates events into periods of time It allows integrating transactional data with demographic customer data It is used in cases when the raw data contains static information such as age gender or profession of an individual and dynamic variables such as spending patterns or credit card transactions Data is automatically aggregated within user defined periods without programming SQL or changing database schema nfinite nsight Explorer Event Logging combines and compresses this data to make it available to other nfinite nsight features Intinitelnsight Explorer Sequence Coding formerly known as KSC aggregates events into a series of transitions For example a customer click stream from a Web site can be transformed into a series of data for each session Each column represents a speci
45. Model to a New Data Set 2 In the Generate drop down list select the option Decision 3 Click the Generate button The screen Classification Decision will appear My KXEN InfiniteInsight class_Census01 a Classification Decision Threshold ofPopulation ofDetected Target Score Threshold of Population l 2 5 of Detected Target 68 0 a 23 8 Score Threshold 0 151 Confusion Matrix Predicted 1 2965 True 1 2973 True 0 9488 Classification Rate 84 77 Total Population 12 4614 Cost Matrix Predicted 1 Predicted 0 True 1 0 0 om Maximize Profit True 0 0 0 Gn 4 Use the slide to set the percentage of population to detect 5 Click the Next button The model is applied to the new data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 158 5 2 4 2 3 Understanding the Classification Decision Screen The screen Classification Decision allows you to either select a percentage of the population who will respond positively to your campaign of Detected Target or a percentage of the entire population of Population When moving the cursor on the scale the different values are updated accordingly For example if you select the option of Detected Target and set the cursor to 80 the value of the field of Population will be 32 0 which means t
46. Reduce the number of explanatory variables used by the model while maintaining the initial quality KI and robustness KR Generate a model of degree 2 using the most significant variables of the degree 1 model Note If your data set contains date or datetime variables automatically generated variables will appear in this panel For more information refer to section Date and Datetime Variables Automatically Generated Variables on page 30 Vi To Refine a Model 1 On the screen Using the Model click the option Select Variables The screen Selecting Contributory Variables will appear KXEN InfiniteInsight class_Census01 or Selecting Contributory Variables FAA AYA AA HOSS OOO eec Number of Selected Variables 0 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 175 2 Inthe Targets list select the target variable for which you want to select the contributory variables 3 Click the button Smart Selection The window Smart Variables Selection opens Smart Variables Selection xj Percentage of Information Retained 93 82 Remaining Variables 9 Skipped Variables 5 Remark 0 variable s automatically exduded On the bar Percentage of Information Retained move the cursor to change the amount of information to keep the number of variables
47. SAP AG or an SAP affiliate company All rights reserved 33 4 6 5 3 Weight Variable 4 6 5 3 1 Definition A weight variable allows one to assign a relative weight to each of the observations it describes and actively orient the training process To declare a variable a weight variable results in creating a number of copies of each of the data set observations proportional to the value they possess for that variable Specifying a weight variable can be used either to assign a higher weight to a single line orto do stratified sampling The effect of the weight can be considered as the following a line with a weight of two in the training data set is exactly equivalent to having two identical lines with a weight of one 4 6 5 3 2 Example Imagine a data set in which the observations correspond to individual Americans These observations are described by the variable age among others Defining the variable age as a weight variable means that for generation of the model older individuals will be weighted more heavily than younger individuals 4 6 5 3 3 Constraints Governing Use Only positive continuous variables may be used as weight variables 4 7 Models 4 7 1 Fundamental Definition The term model carries many different meanings depending on its field of application In Data Mining a model describes and explains the relationships which exist between input data explanatory variables and output data one or more
48. SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 67 Continue You will need to contact your Administrator who will tell you which action to take and configure the Explain mode If the Administrator validates the execution of the query you may want all queries with the same duration to be executed without validation In that case check the box Do not request validation anymore for similar requests The validation message will then only appear for larger queries This configuration will only be used for the current session when closing nfinite nsight it will be lost For a permanent configuration see your DBMS Administrator who will find the necessary information in the support document Explain Mode available in section Support and Integration Documentation of nfinite nsight documentation 5 2 1 2 Describing the Data Selected B For this scenario Select Text Files as the file type Use the file Desc_CensusOl csv as the description file for the CensusOl cvs data file M To Select a Description File 1 Onthe screen Data Description click the button Open Description The following window opens xi ont Type ext Files x Folder Samples Census z C Browse Description i a oa Browse OK Cancel 2 Inthe window Load a Description select the type of your description file 3 Inthe Folder fiel
49. SQL expressions some observations can not be described by the SQL expressions and are left outside the cluster They are called the unassigned observations some observations are described by two different SQL expressions thus appearing in two clusters This is called the overlap SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 249 Diagram Explanation This graph presents the final result obtained with SQL expressions An observation cannot appear in two different clusters so when there is overlap between clusters the observation concerned by the overlap is kept in the first cluster created The second cluster that was also containing the observation is redefined to exclude it In this schema the a numbers correspond to the order of creation of the clusters You can see that the observations that were in two clusters are kept in only one The choice of the cluster in which the overlapping observations y are kept depends on the order in which the SQL rules are applied In this case the rule defining cluster 2 has been applied before the rules defining the clusters 1 and 3 Centroid Schema Key Observation Unassigned Observation Overlapping Observation sat Defined cluster Overlap Area Centroid Division How to decide which segmentation is better As a side effect of the supervision nfinite ns
50. Select the destination folder 4 Click OK The plot is saved as a PNG formatted image M To Print the Model Graph gt 5 Clickthe Print button situated under the title A dialog box will appear allowing you to select the printer to use Select the printer to use and set other print properties if need be Click OK The report will be printed N Oo MI To Export the Model Graph to Microsoft Excel Click the Export to Excel Format button situated under the title An Excel sheet opens containing the model graph you are currently viewing along with its data A B G D E F G H I J K 1 Performance percentage 2 EM percentage Random Wizard Validation 4 0 00 0 00 0 00 0 00 5 0 05 0 05 0 21 0 20 6 0 10 0 10 0 42 0 36 7 0 15 0 15 0 63 0 50 8 0 20 0 20 0 84 0 60 9 0 25 0 25 1 00 0 70 10 0 30 0 30 1 00 Performance T 0 35 0 35 1 00 I 12 0 40 0 40 1 00 H 13 0 45 0 45 1 00 14 0 50 0 50 1 00 j 15 0 55 0 55 1 00 H 16 0 60 0 60 1 00 i 17 0 65 0 65 1 00 H 18 0 70 0 70 1 00 19 0 75 0 75 1 00 0 00 t i t i 20 0 80 0 80 1 00 0 00 0 20 040 0 60 0 80 1 00 1 20 21 0 85 0 85 1 00 percentage 22 0 90 0 90 1 00 23 0 95 0 95 1 00 Random Wizard Validation 24 1 00 1 00 1 00 25 4 gt h KxReport0 Sheet J 4 Note compatible with Excel 2002 2003 XP and 2007 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP
51. Sort by Contribution of dass v 2 On the left side of the screen Explanatory variables select a variable such as marital status Its values will appear in the section Modifying values on the right side of the screen lt KXEN InfiniteInsight class_Census01 Explanatory Variables Sort by Contribution of dass x z st SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 172 3 Inthe section Modifying values in the Value field select or enter a value such as Married civ spouse The value will appear in the table of Explanatory variables across from the selected variable lt KXEN InfiniteInsight class_Census01 a Simulating the Model Explanatory Variables Sort by Contribution of dass x Gr SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 173 4 Ifyou would like to select other explanatory variables go back to step 2 Otherwise go to step 5 5 Click the Run button to perform a model simulation The results of the simulation will appear in the Results section You will obtain the Predicted value score of the observation described in the table of Explanatory variabl
52. This option allows you to generate in the output file the best score s for each observation For each line in the application data set nfinite nsight compares the scores obtained by the current observation for each category of the target variable and displays the best score in the column best_rr_ lt Target Variable gt _1 then if several scores have been requested the second best score is displayed in the column best_rr_ lt Target Variable gt _2 the third best in the column best_rr_ lt Target Variable gt _3 and so on When using this option with the Decision option described below you can link the best score with the category that has obtained it 2 4 25 3 3 12 Decision This option allows you to generate in the output file the best decision s for each observation Like for the previous option the scores obtained for each category of the target variable are compared and the category with the best score for the current record is displayed in the column decision_rr_ lt Target Variable gt if several decisions have been requested the category with the second best score is displayed in the column decision_rr_ lt Target Variable gt _2 the one with the third best score in the column decision_rr_ lt Target Variable gt _3 and so on SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 165 I i 24 2 5 3 313 Probabilities This option a
53. a risk score of about 30 and the segment 37 43 has a risk score of about 15 According to the parameter PDO set in this example to 15 it is easy to conclude that the segment 37 43 is two times more risky or that the odds of the segment 37 43 are two times inferior to the segment 24 27 KXEN InfiniteInsight class_Census01 10 x Fa Score Card Gr SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 139 5 2 3 9 Confusion Matrix The panel Confusion Matrix allows you to visualize the target values predicted by the model compared with the real values and to set the score above which the observations will be considered as positive that is the observations for which the target value is the one wanted This panel also allows you to simulate your profit depending on the selected threshold score or to automatically adapt the threshold to obtain a maximum profit allt Confusion Matrix ofPopulation ofDetectedTarget Score Threshold of Population of Detected Target 23 8 Score Threshold Predicted 1 2965 Predicted 0 9496 True 1 2973 True 0 9488 Total Population 12 4614 Predicted 1 Predicted 0 Profit 0 Random 0 Maximize Profit Gain 0 Cancel Previous 5 2 3 9 1 Definitions A positive observation is an observation that belongs to the target
54. affiliate company All rights reserved 101 5 2 1 7 3 Learning Mode Tab This tab allows you to select a specific learning mode for your model M To Enable a Specific Learning Mode 1 Select the tab Learning Mode Advanced Model Parameters General Auto selection JV Enable Specific Learning Mode Learning Mode Rue Mode x Enable Rule Mode Low Probability fo 2 Score for Low Probability 200 High Probability f s Score for High Probability 800 2 Check the box Enable Specific Learning Mode The tab activates In the list Learning Mode select the one you want to use for your model The available learning modes are described below SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 102 Intinitelnsight Modeler Regression Classification Rule mode allows advanced users to ask a nfinitelnsight Modeler Regression Classification model to translate its internal equation obtained with no constraints into a specified range of scores associated with specific probabilities When this mode is activated the different encodings that are used internally for continuous and ordinal variables are merged in a single representation allowing a simpler view of the model internal equations This is particularly useful when the usage of predictive model is subject to legal restrictions the model equations are now simple enough to be
55. b If the entire 4 5 of the initial data set has been distributed distribution operations go to step 4 4 The final 1 5 of the initial data set is sent as a block of data to the test sub set 4 4 3 2 7 Periodic Without Test The Periodic without test strategy distributes the whole initial data set in a periodic manner to the two sub sets of estimation and validation 3 4 of the initial data set are distributed to the estimation sub set 1 4 to the initial data set are distributed to the validation sub set In other words this cutting strategy is implemented by following this distribution cycle 1 Three lines ofthe initial data set are distributed to the estimation sub set 2 Oneline is distributed to the validation sub set 3 Distribution begins again at step 1 As no test sub set is used all the data from your training data set can be used for sub sets of estimation and validation This can lead to a model with a better quality and robustness 4 4 3 2 8 Sequential The Sequential strategy cuts the initial data set into three blocks corresponding to the usual cutting proportions The lines corresponding to the first 3 5 of the initial data set are distributed as a block to the estimation data set The lines corresponding to the next 1 5 of the initial data set are distributed as a block to the validation data set The lines corresponding to the final 1 5 of the initial data set are distributed as a block to the test data
56. been calculated The Variable Profile indicates the distribution of observations belonging to a cluster of global data set within the categories of each variable In other words the profile indicates the proportion of observations contained in each of the categories of that variable The variable gender of a data set can be distributed as follows 53 of observations belong to the category male 47 of observations belong to the category female This distribution corresponds to the profile of the variable gender over this data set Given a cluster A taken from this data set the same variable gender may be distributed as follows 80 of observations belong to the category male 20 of observations belong to the category female This distribution corresponds to the profile of the variable gender over cluster A The clusters profiles allow you to view and compare the profiles of the variable gender over the data set and the clusters taken from this data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 242 6 2 3 6 2 Displaying Clusters Profiles M To Display Cluster Profiles 1 Onthe screen Using the Model click Clusters Profiles The screen Clusters Profiles will appear lt KXEN InfiniteInsight class_Census01 aly Cluster Profiles Alf ASEAS O Cluster Index Frequencies 10 4 17
57. been calculated the SQL expression defining the cluster The figure below presents the screen Cluster Profiles which appears as the default plot for this scenario The plot presents the SQL expression for cluster 1 KXEN InfiniteInsight class_Census01 aly Cluster Profiles Af ASBAS O Cluster Index Frequencies 10 4 17 6 18 0 3 7 3 5 5 3 11 3 17 0 4 7 2 3 4 5 6 z 8 9 10 Variable ranges for Cluster 1 E AND gt relationship in Wife Husband gt occupation in Adm clerical Machine op inspct Protective serv Tech support Farming fishing Tr gt hours per week in 1 50 50 54 60 99 NOT E un SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 244 6 2 35 6 3 1 Cross Statistics Plots Cross statistics plots contain two curves The blue area corresponds to the profile of the variable selected over the cluster selected The red area corresponds to the profile of the variable selected over the entire data set The figure below presents the Cross Statistics obtained in this scenario for cluster 7 KXEN InfiniteInsight class_Census01 aly Cluster Profiles Bf Abbas O Cluster Index Frequencies Cluster Custom Name 8 10 9 1 2 6 5 3 4 Res
58. case of a binary classification task people are interested by the difference between the Lorenz curve page 47 for the good cases l a and the Lorenz curve for the bad cases B when selecting an increasing ratio of population These curves evolve from O to 1 together and the K S statistics is the maximum deviation between these two curves For a perfect system the K S statistics is 1 and that for a random system because of the equality between the two curves the K S statistics is O TIP The K S is used to calculate the difference between two distributions in order to have an idea about the quality of a data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 41 4 8 2 3 AUC The AUC statistic is a rank based measure of model performance or predictive power calculated as the area under the Receiver Operating Characteristic curve see ROC on page 46 For a simple scoring model with a binary target this represents the observed probability of a signal responder observation having a higher score than a non signal non responder observation For individual variables ordering based on score is replaced by ordering based on the response probability for the variable s categories for example cluster ID or age range response rates The corresponding equation is 5 1 AUC fadil 8 ly dy 0 00 So we have AUC 1 L zT 1 AUC a ir dr
59. contains two variables and a correlation rate When you modify the number of correlations to display the engine excluded the ones with the lowest correlation rate thus keeping only the more significant ones Correlations Settings Higher than _ _ 0 504 Keep all Correlations Keep the First 1 024 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 97 This section allows setting some regression parameters according to three strategies This option can only be activated when the model contains at least one continuous target variable The description of these strategies and an example of performance curve for each strategy are provided in the table below Regression Strategy Without post processing With original target encoding With uniform target encoding Description The first strategy consists in disabling the regression post processing during the learning model phase in order to create a regression similar to the one used in versions prior to 3 3 2 In this case a standard regression is performed No special improvement is made to the final scores Original target values are used and raw score values are produced as outputs The second strategy which applies to regressions using a post processing consists in using the original target value during the learning model phase to compute regression coefficie
60. correspond to the profit contributed by that category In other words they correspond to the relationship of that category to the target variable and whether that category has more or less observations belonging to the target category of the target variable For a given category a positive bar on the right of 0 0 indicates that the category contains more observations belonging to the target category of the target variable than the mean calculated on the entire data set A negative bar on the left of 0 0 indicates that the category contains a lower concentration of target category of the target variable than the mean Note You can display the profit curve for the selected variable by clicking the button wa Display Profit Curve located in the tool bar under the title The importance of a category depends on both its difference to the target category mean and the number of represented cases High importance can result from a high discrepancy between the category and the mean of the target category of the target variable or a minor discrepancy combined with a large number of records in the category or a combination of both The width of the bar shows the profit from that category The positive bars correspond to categories which have more than the mean number from the target category that is responders and the negative bars correspond to categories which have less than the mean number from the target category that is responders
61. cut into three sub sets An Estimation sub set A Validation sub set A Test sub set A cutting strategy determines the way in which the data of the training data set are distributed across the sub sets The Estimation and Validation sets are used for actual training and the Test set sometimes referred to as the hold out sample is used to ensure that the predicted performance is correct Note When using the InfiniteInsight the data sub sets are virtual they are not stored in memory at any time The file corresponding to the initial data set remains intact at all times The figure below illustrates the model generation process known as the training phase Validation sub set Training data set m o Test sub set SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 35 4 7 5Representation of a Model A model may be represented in many different ways including adecision tree aneural network amathematical function In the nfinite nsight models are represented in the form of mathematical functions specifically polynomials 4 7 5 1 Description of the Polynomial A polynomial may be of degree 1 2 3 or greater By defining the polynomial degree you are defining the degree of complexity of the model 4 7 5 2 Examples of Polynomials A polynomial of degree lis of the form f X1 X2 Xn w0 wh Xl
62. data set Data Set the source file Source the number of records contained in the data set Number of Records andthe number of variables for which nfinite nsight has found deviations in comparison to the data set originally used to train the model Number of variables showing deviation The second and third section of the debriefing report allows you to compare the performance of your model on the original data set with the its performance on the control data set the section Performance Indicators displays for each target the KI and KR indicators obtained by the model on the original data set the section Performance on Control Data Set displays for each target the KI and KR indicators obtained by the model on the control data set If the KI and or KR of the model on the control data set are significantly lower it means that the relation between the variables and the target variable has changed as a consequence the model should be rebuilt on the new data If the KI and KR are not much different it means that the relation between the input variables and the target behavior has not changed but it does not mean that differences of distributions are not possible The panel Control for Deviations provides you with six options that can be separated in three groups thefirst one made of the options Probability of Deviation Probability of Category Deviation and Probability of Grouped Category Deviation enumerates the pr
63. following definition applies to continuous targets some wording may be simplified for binary targets The formulas presented below can also be applied to the binary target case use categories instead of segments in this case We consider the case where a nfinite nsight Modeler Regression Classification regression model is trained on a continuous target signal S with the help of an input variable X Infinitelnsight Modeler Regression Classification starts by binning the continuous target S into B segments the target We will suppose that the input X is a nominal categorical variable though the whole process can be extended easily to the case of ordinal and continuous inputs We will suppose that X has N categories X Xv We are interested in assessing the importance of a category Xi with respect to the target S The importance of a category depends on two factors The fact that the distribution of the target for this category is significantly skewed towards high values or low values when compared with the distribution of the target on the entire population The frequency of this category High importance can result from either of the following a high discrepancy between the target distribution for cases associated to this category and the distribution of the target variable for the entire population a minor discrepancy combined with a large number of records in the category a combination of both KXEN uses a non pa
64. frextFies Folder pA SamplesCenss O l oee File Table C Browse 2 Complete the following fields Model Name This field allows you to associate a name with the model This name will then appear in the list of models to be offered when you open an existing model Description This field allows you to enter the information of your choosing such as the name of the training data set used the polynomial degree or the KI and KR performance indicators obtained This information could be useful to you later for identifying your model Data Type this list allows you to select the type of storage in which you want to save your model The following options are available Text files to save the model in a text file Database to save the model in a database Flat Memory to save the model in the active memory SAS Files to save the model in a SAS compatible file for a specified version of SAS and a specified platform SAS v6 or 7 8 for Windows or UNIX SAS Transport to save the model in a generic SAS compatible file Folder Depending upon which option you selected this field allows you to specify the ODBC source the memory store or the folder in which you want to save the model File Table This field allows you to enter the name of the file or table that is to contain the model The name of the file must contain one of the following format extensions txt text file in which the data is separated by tab
65. loss age education Variables hours per week native country fnlwgt relationship workclass xion Cancel Previous If your data set contains date or datetime variables automatically generated variables can appear in this panel For more information refer to section Date and Datetime Variables Automatically Generated Variables on page 30 2 You can drill down on a variable that is display the plot of details of this variable where the categories of the variable can be seen To zoom in ona variable double click the corresponding bar Go to section Significance of Categories SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 122 5 2 3 4 3 Understanding Contributions by Variables Only the plot Maximum Smart Contributions by Variables the default selection is presented in this guide The Contributions by Variables option allows the user to examine the relative significance of each of the explanatory variables in relation to the target variable This significance is relative as the weight of each variable is pro rated as a function of the significance of the other explanatory variables allt Contributions by Variables DBAs Chart Type Maximum Smart Variable Contibutons H Maximum Smart Variable Contributions 0 000 0 025 0 050 0 075 0 100 0 125 0 150 0 175 0 200 0 225 0 250 marital status capi
66. must be the key to its success and they must be measurable 7 1 1 1 80 periodic cutting strategy The periodic cutting strategy is implemented by following this distribution cycle 1 Three lines of the initial data set are distributed to the estimation sub set 2 Oneline is distributed to the validation sub set 3 Onelineis distributed to the test sub set 4 Distribution begins again at step 1 SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 273 7 1 1 1 81 pivot A pivot is a data summarization tool found in data visualization programs Among other functions they can automatically sort count and total the data stored in one table or spreadsheet and create a second table displaying the summarized data Pivot tables are also useful for quickly creating cross tabs 7 1 1 1 82 polynomial A polynomial may be of degree 1 2 3 or greater By defining the polynomial degree you are defining the degree of complexity of the model 7 1 1 1 83 population A population is a list of entity identifiers A population may be defined as list of values This list can be extracted from a column table it is then said to be defined in extension or through a filtering expression from another population it is then said to be defined in intension 7 1 1 1 84 prediction range The extreme values for prediction ranges are TargetMean sqrt TargetVariance TargetMean sqrt Target
67. of an individual within a community leader vs follower 7 1 1 1 105 standard deviation The standard deviation is a measure of the dispersion of a collection of numbers 7 1 1 1 106 standardized profit Standardized profit allows examination of the contribution of the model generated by nfinite nsight features relative to a model of random type that is in comparison with a model that would only allow to select observations at random from your database This profit is used for the plots of variable details which present the significance of each of the categories of a given variable with respect to the target variable 7 1 1 1 107 statistical report The Statistical Reports provide you with a set of tables that allows you a more detailed debriefing of your model 7 1 1 1 108 storage To describe the data nfinite nsight uses five types of storage formats date datetime number integer string 7 1 1 1 109 sub sampling Sub sampling means selecting a part of a whole if an event cannot be processed as a whole a limited number of measures have to be taken to represent this event SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 278 7 1 1 1 110 support The support of a rule is a measure that indicates the number of sessions that verify the rule For instance the number of sessions that contains the itemset A B C and the item D T 7 1 1 1 1
68. one textual variable you will not be able to go to the next panel Key whether this variable is the key variable or identifier for the record O the variable is not an identifier 1 primary identifier 2 secondary identifier Order whether this variable represents a natural order 0 the variable does not represent a natural order 1 the variable represents a natural order If the value is set at 1 the variable is used in SQL expressions in an order by condition There must be at least one variable set as Order in the Event data source Warning If the data source is a file and the variable stated as a natural order is not actually ordered an error message will be displayed before model checking or model generation Missing the string used in the data description file to represent missing values e g 999 or Empty without the quotes Group the name of the group to which the variable belongs Variables of a same group convey a same information and thus are not crossed when the model has an order of complexity over 1 This parameter will be usable in future version Description an additional description label for the variable Structure this option allows you to define your own variable structure which means to define the variables categories grouping SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 70 5
69. ordered from records predicted least likely to be signals on the left to records most likely to represent signals on the right the slower the rise the more sensitive the model in terms of detecting signals or responders The wizard line turns upward from the x axis at the point corresponding to the proportion of non signals in the validation data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 47 4 10 2 2 Lorenz Bad Lorenz Bad displays the cumulative proportion of true negatives specificity accounted for by the bottom x of modelscores Here the faster the rise the lower the frequency of erroneous detection Performance OL o o 2 O75 2 Ohh O yo 088 Ot 91 OF BH 0 9 percentage E Random W Wizard Validation 4 10 3 Density Curves The density curves display the density function of the variable Score in the set of Events Curve Density Good and in the set of Non Events Curve Density Good These curves can also be viewed as the derivate of Lorenz curves the density function is by definition the derivate of the cumulative density function The estimated density function in a bin or interval is equal to Number of Events inthe Interval Total number of Events Length of the interval The length of an interval is by definition its upper bound minus its lower bound SAP Infinitelnsight 6 5 SP4 CUSTOME
70. population Anegative observation is an observation that does not belong to the target population SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 140 5 2 3 9 2 Understanding the Confusion Matrix There are three ways to set the threshold using the displayed slide bar by selecting the percentage of population to target if the population is sorted by descending order of score of Population by selecting the percentage of positive observations you want to detect of Detected Target by selecting the score used to differentiate positive observations from negative ones Score Threshold Any observation with a score above the threshold is considered positive on the contrary any observation with a score below the threshold is considered negative The slide is graduated from the lowest score on the left to the highest score on the right The values corresponding to each option are displayed is displayed under the slide When you move the cursor the confusion matrix is updated accordingly The following table details how to read the confusion matrix Predicted Target Category Predicted Non target Category Positive Observations Predicted Negative Observations Predicted True Target Category Number of correctly predicted Number of actual positive observations positive observations that have been predi
71. relevant section e Infinitelnsight Modeler Regression Classification e Infinitelnsight Modeler Segmentation Clustering Read all sections of the guide through at least once in the order in which they are presented In both cases ensure that you have a complete grasp of the essential concepts relating to the use of nfinite nsight by reading chapter Essential Concepts on page 16 These concepts are essential both for the use of nfinite nsight features and for analysis of the results obtained You could limit yourself to 1 Verifying that you are familiar with the terminology used by KXEN by examining the contents of chapter Essential Concepts on page 16 in the detailed table of contents 2 Reading the summary of the scenario of the feature that interests you Application Scenario Enhance Efficiency and Master your Budget using Modeling Application Scenario Customize your Communications using Data Modeling 3 Going directly to the relevant section nfinitelnsight Modeler Regression Classification Infinitelnsight Modeler Segmentation Clustering You can Follow the application scenarios for a review of the features that interest you Application Scenario Enhance Efficiency and Master your Budget using Modeling Application Scenario Customize your Communications using Data Modeling Use this document as a reference text consulting it as required In this case the detailed ta
72. reserved 82 ire SSS SS eer 5 2 1 2 7 1 Band Count for Continuous Variables When you work with no defined structure you can set the band count for continuous variables The allowed values for this parameter are between 1 and 20 The population is thus divided into as many segments of similar size These segments are used to build descriptive statistics particularly the distribution of target variables for each segment which affects the coding of the variable with respect to target variables The band count has an influence on the calculation of KI the more there are segments the more accurate is the calculation of KI for the explanatory variable However this influence is very small lV To Set the Band Count for Continuous Variables 1 Right click the row corresponding to the continuous variable to be edited 2 Select Define Structure 3 Select Set Band Count for Continuous Variables lt KXEN InfiniteInsight New Regression Classification Model Ee Description Desc_CensusO1 csv age workdass fnlwgt jeducation jeducation num marital status occupation relationship race sex icapital gain capitaltoss Extract Categories from Statistics Extract Structure from Model Extract Structure from Model for All Variables Extract Structure from Variable 7 Add Filter in Data Set py Analyze T open Description SAP Infinitelnsight 6 5 SP4 CUSTO
73. rights reserved 188 4 Select the folder that holds the model that you want to open The list of models contained in that folder will appear providing the following information for each model Column Description Values Name Name under which the model has Character string been saved Class Class of the model that is the Kxen Classification Classification Regression with nominal type of the model target Kxen Regression Classification Regression with continuous target Kxen Segmentation Clustering with SQL Mode Kxen Clustering Clustering without SQL Mode Kxen TimeSeries Time Series Kxen AssociationRules Association Rules Kxen SimpleModel Classification Regression and Clustering multi target models any other model Version Number of the model version Integer starting at 1 when the model has been saved several times Date Date when the model has been Date and time in the format yyyy mm dd hh mm ss saved Commen Optional user defined comment Character string t that can be used to identify the model 1 Select a model from the list 2 Click the Open button The screen Using the Model will appear KXEN InfiniteInsight class_Census01 e Using the Model ald Display Model Overview Model Graphs Contributions by Variables Category Significance Statistical Reports Scorecard Confusion Matrix Run Analyze Deviations Apply Model Simulation Select Variables Save Export Ge
74. screen Advanced Model Parameters appears E KXEN InfiniteInsight class_Census01 Advanced Model Parameters General auto selection Learning Mode Gain Chart Polynomial Degree i Score Bins Count 24 Correlations Settings Higher than _ DE Keep all Correlations Keep the First 1 024 i Enable post processing Original target encoding Uniform target encoding Target Key Settings Target Target Key dass 5 2 1 7 1 General Tab The General tab allows you to define the general settings of the model that is the degree of the model the score bin count the number of correlations to display and the target key value 5 2 1 7 1 1 Defining the Degree of the Model optional The model generated by nfinitelnsight Modeler Regression Classification is represented by a polynomial This polynomial may be of degree 1 2 3 or greater By defining the polynomial degree you will define the degree of complexity of the model It is greatly recommended that you always use a degree of 1 default value for the first analysis of a data set Using a higher degree of polynomial does not guarantee that you will in all cases obtain a more powerful model For more information about the polynomial degree Representation of a Model page 36 E For this Scenario Keep the polynomial degree set to the default value that is 1 Vi To Define the Degree of the Model In th
75. selected changes accordingly The further this cursor is moved to the left the more variables are excluded The variables excluded are selected automatically as a function of their significance with respect to the model For instance the figure below shows that to retain only two variables out of the original fourteen you should keep 43 07 of the information contributed by the model lt Smart Variables Selection x Remaining Variables 2 Skipped Variables 12 Remark 0 variable s automatically exduded Note Certain variables in the training data set may contribute no information such as constant value variables These can therefore be automatically excluded from the model during the training phase The number of variables excluded is displayed as a Remark In the figure above this number is equal to 0 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 176 5 Click the OK button The window Smart Variables Selection closes and the panel Selecting Contributory Variables is updated with the selected variables allowing you to view the kept variables and the excluded ones In our example Intinitelnsight automatically determined that the two explanatory variables that contributed the most information to explain the target variable were the variables marital status and capital gain KXEN In
76. set SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 23 4 4 3 2 9 Sequential Without Test The Sequential without test strategy cuts the initial data set into two blocks The lines corresponding to the first 3 4 of the initial data set are distributed as a block to the estimation data set The lines corresponding to the next 1 4 of the initial data set are distributed as a block to the validation data set As no test sub set is used all the data from your training data set can be used for sub sets of estimation and validation This can lead to a model with a better quality and robustness 4 5 Table of Data 4 5 1Definition A table of data is a data set presented in the form of a two dimensional table In this table Each row represents an observation to be processed such as American individual in the sample file CensusOl csv Each column represents a variable that describes observations such as the age or the gender of individual Americans Each cell the intersection of a column and a row represents the value of the variable in the column for the observation in that row The following table is an example of a table of data Observations Variable 1 Variable 2 Variable 3 Observation a Value al Value a2 Value a3 Observation b Value b1 Value b2 Value b3 Observation n Value n1 Value n2 Value n3 SAP Infinitelnsight 6 5 SP4 CUSTOMER
77. set are distributed to the validation sub set As no test sub set is used all the data from your training data set can be used for sub sets of estimation and validation This can lead to a model with a better quality and robustness 4 4 3 2 5 Periodic The Periodic cutting strategy is implemented by following this distribution cycle 1 Three lines of the initial data set are distributed to the estimation sub set 2 Oneline is distributed to the validation sub set 3 Onelineis distributed to the test sub set 4 Distribution begins again at step 1 SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 22 4 4 3 2 6 Periodic with Test at the End The Periodic with test at the end strategy distributes 4 5 of the initial data set in a periodic manner to the two sub sets of estimation and validation 3 5 being distributed in the estimation data sub set and 1 5 in the validation data sub set 3 5 being distributed The final 1 5 of the initial data set is sent as a block of data to the test sub set In other words this strategy follows this distribution cycle 1 Three lines of the first 4 5 of the initial data set are distributed to the estimation sub set 2 Oneline of the first 4 5 of the initial data set is distributed to the validation sub set 3 a Ifthe entire 4 5 of the initial data set is not yet distributed distribution operations begin again at step 1
78. solutions You can use A shotgun method An intuitive method A classical statistical method for example neural networks Bayesian networks logistic models decision trees The KXEN method 5 1 6 1 Shotgun Method This method consists of performing no selection on your database and sending out a mass mailing to every person recorded in your database This solution guarantees that all persons likely to purchase your product are contacted On the other hand this runs to exorbitant costs far exceeding your budget and is seldom the solution applied In addition it runs the risk of saturating the prospects of your bank with inappropriate offers Spamming SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 56 5 1 6 2 Intuitive Method This method consists of performing a selection that leans on your knowledge of your field that is to say you send your mailing to individuals selected in an intuitive manner from your database This solution allows you to significantly reduce the cost of your marketing campaign and make it fit your budget This method is not optimal because it does not allow you to Control the real costs and return on investment of your marketing operation Select which prospects to contact on a basis of real returns It is true that you probably have a relatively good understanding of which in
79. specifically for the list of supported ODBC compatible sources see the document Data Modeling Specification SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 17 4 3 Data Set To use nfinite nsight features you must have a training data set available that contains the target variable with all its values defined Then you can apply the model generated using the training data set to one or more application data sets 4 3 1Training Data Set A training data set is a data set used for generating a model In this set the values of the target variable on page 31 or variable corresponding to your business issue are known By analyzing the training data set ntinitelnsight features will generate a model that allows explanation of the target variable based on the explanatory variables To allow validation of the model generated the training data set is cut into three sub sets using a cutting strategy on page 19 The training data set may correspond to either a complete population section of your database or a sample extracted from this population The choice depends on the type of study to be performed the tools used and the budget allocated to the study 4 3 2 Application Data Set An application data set is a data set to which you apply a model This data set contains an unknown target variable for which you want to know the value The model applied to
80. status marital status The variable appears in the list below SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 75 VI To Remove a Variable from the List of Variables 15 Inthe list located in the lower part of the panel select the variable for which you do not want to import the structure 16 Click the Delete button K2R_Census_331_1 xj Target from Loaded Model dass Variables from Training Census01 csv workdass bd Variables from Loaded Model K2R_Census_331_1 workdass x sf Add A View Census01 csv K2R_Census_331 Version 1 marital status The variable is removed from this list and added to the list Variable from Loaded Model M To View a Variable Structure Defined in the Loaded Model f the variable has not been added yet to the list of variables located on the lower part of the panel 1 Inthe list Variable from Loaded Model select the variable for which you want to see the structure defined in the model 2 Click the View button the variable structure opens in a new window X K2R_Census_331_1 a Target from Loaded Model dass X Variables from Training Census01 csv workdass Variables from Loaded Model K2R_Census_331_1 workcass Add workclass x marit Group Structure Category Edition age Federal gov k Federal gov p KxMissing KxO
81. the displayed report The data can then be pasted in a text editor a spreadsheet a word processing software If the current report contains more than one view for various variables data sets and so on this option allows you to copy all the views of this report If the current report is displayed as a graph this option allows you to copy it as an image and paste it in a word processing software or a graphic application This option allows you to print the current view of the selected report depending on the chosen display mode HTML table graph This option allows you to save under different formats text html pdf rtf the data from the current view of the selected report This option allows you to save under different formats text html pdf rtf the data from all the views of the selected report This option is available for all display modes and allows exporting the current view into Excel compatible with Excel 2002 2003 XP and 2007 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 251 6 2 4 Step 4 Using the Model IN THIS CHAPTER Applying the Model to a New Data Set cceeeeescceceseeteneeeeaeeteseeeeaeeceaeeseaeeseaeeseaeeseaeeseaeeseaeeseaeeseaeeseeeseaeeseaeess 252 Once generated a clustering model may be saved for later use A clustering model may be applied to additional data sets The model th
82. the model on new data sets 6 Before exporting the script you can view the script by clicking the button Script Preview inixi KXEN Shell Script generated on 2011 11 21 16 21 30 by KXEN version 6 0 0 set echo on set utf8in on default STORE_USER default STORE_PWD default DESC_USER default DESC_PWD Dedaring the TRAINING store and space default TRAINING_STORE_TYPE Kxen FileStore default TRAINING_STORE_NAME Samples Census default TRAINING_STORE_USER STORE_USER default TRAINING_STORE_PWD STORE_PWD default TRAINING_STORE_ALIAS myTrainingStore default TRAINING SPACF Censuisf1 csv 7 Click the Next button to start the generation process Once the script has been generated the menu Using the Model is displayed SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 184 5 2 4 7 Saving the Model Once a model has been generated you can save it Saving it preserves all the information that pertains to that model that is the modeling parameters its profit curves and so on VI To Save the Model 1 Onthe screen Using the Model click the option Save the Current Model The screen Saving the Model will appear KXEN InfiniteInsight class_Census01 _ 0 x amp Saving the Model Model Name fdass_Census01s i i S Description L Data Type
83. understood by legal departments and can be exposed not only in programming language as it was already the case before but even in simple words The underlying technology is also used to display so called score cards To use this mode you need to choose a range of scores associated with probabilities You cannot specify a range such as O 1000 for the scores but you can specify ranges associated with probabilities of detection for example you can specify that you would like the score 200 to be associated with a probability of detection of the least frequent category of 20 and the score 800 to be associated with a probability of detection of 80 In this case nfinite nsight Modeler Regression Classification will automatically re scale the scores in order to align the probabilities of detection to the specified scores E In this Scenario Do not activate the rule mode M To Activate the Rule Mode 1 Inthe list Learning Mode select the option Rule Mode The probabilities and scores fields are displayed Enable Specific Learning Mode Learning Mode Rule Mode Enable Rule Mode Low Probability 10 2 Score for Low Probability 200 High Probability 10 8 Score for High Probability 800 2 Use the fields Low Probability and High Probability to set the probabilities range 3 Indicate in the fields Score for Low Probability and Score for High Probability what score is expected in each case Infinitelnsight R
84. variables Variables Selection Parameters Each step removes vanables Eachstep keeps 95 0 of nformation Search process stops with adropof 5 0 of KI and the KR Ts Ez SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 99 The section Auto selection allows you to automatically reduce the number of variables in the model in relation to quality criteria This selection is done by successive iterations There are two selection modes one based on the number of variables to keep and the other on the amount of information that should be kept In this instance the information is the sum of the variables contributions M To Use the Auto selection Check the box Enable Auto Selection The corresponding options are activated V Enable Auto selection Select the best model keeping between 1 and all variables By default the parameters are set to Se ectthe best model keeping between andall variables Any parameter that can be changed is marked as a hyperlink blue underlined 5 2 1 7 2 1 1 Choosing the Selection Mode VI To Select the Selection Mode 1 Click the link indicating the type of model to keep For example the best model in the sentence Se ectthe best model keeping between andall variables A drop down menu is displayed offering the following options the best model the last model 2 Select the desire
85. want to give the output file In the Generate Field select the type of output values that you want for the target variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 169 5 You may also opt to select Save only outlier observations If you select this option only the outlier observations will be presented in the results file obtained after applying a model 6 Click the Apply button The screen Applying the Model will appear Once application of the model has been completed the results files of the application is automatically saved in the location that you had defined from the screen Applying the Model KXEN InfiniteInsight class_Census01 a Applying the Model Beginning of applying model Please wait SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 170 E For this Scenario Open the results file in Microsoft Excel in the text format that you obtained when you applied the model to the CensusOl csv file MV To Open the Model Application Results File 1 Depending upon the format of the results file generated use Microsoft Excel or another application to open the file The figure below presents the headings and columns of the results file obtained for this scenario
86. which to save the results file Wodel Generated Output Do not select the option Save only outlier observations lV To Apply the Model to a New Data Set 1 Onthe screen Using the Model click the option Applying the model to a new data set The screen Applying the Model will appear KXEN InfiniteInsight class_Census01 lt e Applying the Model Application Data Set Data Type Text Files x Folder Samples Census v og Browse Data l a W Browse Define Mapping Generation Options Generate Predicted Value Only x Advanced Apply Settings Mode apply z IV View Generated Outputs T Use direct apply in the database Results Generated by the Model Data Type Text Files x Folder Samples Census x og Browse Data Et P W Browse Define Mapping qi Ei ero tens 2 Inthe section Application data set select the format of the data source Text files or ODBC 3 Click the Browse button to select n the Folder field the folder which contains your data set In the Data field the name of the file corresponding to your data set 4 Inthe section Results generated by the model select the file format for the output file Text files or ODBC Note The current version of InfiniteInsight does not allow you to save the file in an ODBC database Click the Browse button to select In the Folder field the folder in which you want to save the output file In the Data field the name that you
87. 00 150 100 50 o NO yah A W ye E er 2 oh E 1 E gD G2 ge gt Qe a score SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 50 4 10 4 2 Probability of Risk The X axis represents the risk score and the Y axis represents the odds ratio value The probability of risk p is computed for each risk score bin this way number of Bad divided by the number of records in the risk score bin Probability Performance NO pa se pp ops er op oP ot 52 gD Gh oO GF E ol e 40 score 4 10 4 3 Population Density The density is computed according to the number of records in each risk score bin 20 by default 1 600 1 500 1 400 1 300 1 200 1 100 1 000 900 800 700 600 500 400 300 200 100 Density Performance R B a E B E N gO oP a oP L Gro GP GF oe A P 40 score m Validation SAP Infinitelnsight 6 5 SP4 Essential Concepts CUSTOMER 2013 SAP AG or an SAP affiliate company All rights reserved 51 4 10 4 4 Risk All All three curves are displayed in the same graph Note that the y axis of the probability curve is on the right hand side The y axis of the population density and the good bad odds on the left Risk Performance Chart 1 600 1 0 1 500 1 400 09 1 300 08 1 200 w 1 100 07 3 1 000 o os 900 3 E o 800 05 S 700 F 5 0 4 amp eo ix 500 03 4
88. 00 300 02 200 oa 100 f o 0 0 NO sa E ye oe sO or 67 ad a6 ot 42 go Gr oO gt eo oI ee 40 score Density Validation Odds Validation Probability Validation SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 52 5 Infinitelnsight Modeler Regression Classification IN THIS CHAPTER Application Scenario Enhance Efficiency and Master your Budget using Modeling cecceesceeereeeteeeeeeeeeee 53 Creating a Classification Model Using Infinitelnsight Modeler ccecceeeeceseeeeseneeeeeeeeeeeeseaeeeeeeeseaeeeseeeseeesaaes 64 5 1 Application Scenario Enhance Efficiency and Master your Budget using Modeling 5 1 1 Presentation In this scenario you are the Marketing Director of a large retail bank The bank wants to offer a new financial product to its customers Your project consists of launching a direct marketing campaign aimed at promoting this product You have a large database of prospects at your disposal and a limited and closely monitored budget and you are also subject to significant time constraints In order to maximize the benefits of your campaign your business issue consists of Contacting those prospects most likely to be interested in the new financial product Identifying the ideal number of prospects to contact out of the entire database Using the nfinite nsight Modeler Regression Classif
89. 1 000 000 customers that will be used in this scenario can not be provided to you You will apply the model to the file CensusOl1 csv which you used to generate the model In this manner you will be able to compare the predictions provided by the model to the real values of the target variable Class for each of the observations In the procedure To Apply the Model to a New Data Set Select the format Text files Inthe Generate field select the option Individual Contributions Select the folder of your choice in which to save the results file Model Generated Output Do not select the option Keep only outliers VI To Apply the Model to a New Data Set 1 Onthe screen Using the Model click the option Applying the model to a new data set The screen Applying the Model will appear KXEN InfiniteInsight class_Census01 w Applying the Model Application Data Set Data Type Text Files kA Folder l Samples Census x l Browse Data a W Browse Define Mapping Generation Options Generate Predicted Value Only 7 Advanced Apply Settings Mode apply x IV View Generated Outputs JT Use direct apply in the database Results Generated by the Model Data Type ext Files X Folder Samples Census v og Browse Data Pt Pd W Browse Define Mapping SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP af
90. 11 table of data A table of data is a data set presented in the form of a two dimensional table 7 1 1 1 112 target key The target key is the expected value of the target 7 1 1 1 113 target variable A target variable is the variable that you seek to explain or for which you want to predict the values in an application data set It corresponds to your domain specific business issue 7 1 1 1 114 temporal analytical data set A temporal analytical data set is a special case of analytical data set It is the product of a time stamped population by an analytical record the result of this operation can be seen as a virtual table containing attributes values associated with identifiers in relation with the time stamp In other words a temporal analytical data set contains photos or snapshots of a given list of entities taken at a given time this time can be different for each entity and an entity can be associated with several photos Note Analytical data sets are used to train predictive descriptive models and to apply these models 7 1 1 1 115 time series A time series is a sequence of data points measured typically at successive times spaced at often uniform time intervals SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 279 7 1 1 1 116 timeout A specified period of time that will be allowed to elapse before a specified event is to take place unless
91. 2 1 2 3 Viewing the Data To help you validate the description when using the Analyze option you can display the first hundred lines of your data set M To View the Data 1 Click the button View Data A new window opens displaying the data set top lines lt KXEN Sample Data View Data Set Census0t csv Data Statistics F Graph State gov 77516 Bachelors INever married Adm c Self emp no 83311 Bachelors Married civ Exec r Private 215646 HS grad Divorced Private 234721 11th Married civ Private 338409 Bachelors Married civ Private 284582 Masters Married civ Private 160187 9th Married spo Self emp no 209642 HS grad Married civ Exec r Private 4578 1 Masters Never married Prof s Private 159449 Bachelors Married civ Exec r Private 280464 Some college Married civ Exec State gov 141297 Bachelors Married civ Prof s Private 122272 Bachelors INever married Adm c Private 205019 Assoc acdm Never married Sales Private 121772 Assoc voc Married civ Craft Private 245487 7th 8th Married civ Transp Self emp no 176756 HS grad Never married Farmir Private 186824 HS grad Never married Machir Private 28887 11th Married civ Sales Self emp no 292175 Masters Divorced Execs 40fPrivate 193524 Doctorate Married civ Prof si 54Private 302146 HS grad Separated Other 35 Federal gov 76845 9th Married civ Farmir 43Private 117037 11th Married civ Transp 59 Pri
92. 30 14 08 17 1999 04 28 07 21 58 The variab e salary in US dollars 1000 00 1593 and 2000 54 The variab The variab Miller The variab and trans The variab e age in years 21 34 and 99 e family name Lake Martin and e occupation professor engineer ator e telephone 800 555 1234 and 800 555 4321 Note A variable that has numbers for values is not forced to be described using the number storage format For instance the variables telephone and zip code may instead be described using the string storage format because no arithmetic operations that make any sense can be performed on these values Similarly a variable that will be used as an observation identification code in a table and does not comply with supported number formats may be described using the String storage format Warning For number storage formats the decimal separator used must be a decimal point and not a comma So the value 6 5 may be processed while 6 5 will not be processed SAP Infinitelnsight 6 5 SP4 Essential Concepts CUSTOMER 2013 SAP AG or an SAP affiliate company All rights reserved 29 4 6 4 1 Date and Datetime Variables Automatically Generated Variables When your data set contains date or datetime variables the feature KXEN Date Coder KDC automatically extracts date information KDC is able to extract the following temporal information
93. 30936 Validation 0 542226 Estimation occupation 0 456842 Validation 0 453326 Estimation education 0 435558 Validation 0 431178 Estimation education num 0 435558 Validation 0 431178 Estimation age 0 412714 Validation 0 419249 Estimation hours per week 0 34001 Validation 0 349971 Estimation sex 0 229864 Validation 0 239272 Estimation capital gain 0 173681 Validation 0 186205 Estimation workclass 0 166991 Validation 0 159844 Estimation 21 MI To Stop the Learning Process rf 1 Click the ie Stop Learning Process button 2 Click the Previous button The screen Summary of Modeling Parameters appears 3 Go back to the section Check Modeling Parameters SAP Infinitelnsight 6 5 SP4 Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved CUSTOMER 107 5 2 2 3 Validating the Model Once the model has been generated you must verify its validity by examining the performance indicators The quality indicator KI allows you to evaluate the explanatory power of the model that is its capacity to explain the target variable when applied to the training data set A perfect model would possess a KI equal to 1 and a completely random model would possess a KI equal to O The robustness indicator KR defines the degree of robustness of the model that is its capacity to achieve the same explanatory power when applied to a
94. 4386 41310 Population 20690 Population 1726 Positive Target Hac Positive Target El SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 145 Ure M M e a F 5 2 3 10 2 2 Node Details When you select a node the node information is displayed in the tab Node Details located in the lower part of the panel Node Details profit Curve Detected Target class Selected Sub population Whole Population Estimation Validation Population Count 36381 12461 Positive Target Count 8714 2973 Positive Target Ratio 23 95 23 86 This tab indicates the target for which the current decision tree is displayed and provides you with the following information for each data set in the model Population Count that is the number of records found in the current node For continuous targets Target Mean that is the mean of the target for the current node For nominal targets Positive Target Count that is the number of records for which the target is positive Positive Target Ratio that is the percentage of the node population for which the target is positive Negative Target Count that is the number of records for which the target is negative Negative Target Ratio that is the percentage of the node population for which the target is negative Variance that is the variance for the current node W
95. 7 1 1 Customizing Style Sheets Intinitelnsight offers the possibility to customize the generated reports The default style sheet called KXEN Report Style Sheet default cannot be modified You have to create your own style sheets to modify the settings Note To create load or save a style sheet you have to indicate a data source in the panel Edit Options before opening the window KXEN Report Style Sheet Editor M To Create a New Style Sheet In the field Folder click the button n Browse 1 2 Select a folder This folder is your style sheets repository 3 Click the button Add A new style sheet has been created 4 Click the button Lg The panel Report Style Sheet Editor opens 5 Inthe field Style Sheet Name enter a name for the new style sheet The extension KRS is automatically added Note You can duplicate a style sheet by changing the name of your style sheet The previous one is not deleted Mi To Delete a Style Sheet 6 Select one of the displayed style sheets 7 Clickthe button Remove Note The style sheet is not only deleted from the list but also from the data source SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 197 SSS SS aEs M To Edit the General Settings Settings Options Note Reports Background Color choose a color Only the PDF and HTML formats can d
96. AG or an SAP affiliate company All rights reserved 116 2 3 U1 WN N 1 5 1 Fora Nominal Target On the model curve plot different options allow you to visualize Exact profit values for a point for all the displayed curves The curves for the different profit types Detected Lift Normalized and Customized For more information on profit types see Available Profit Types on page 45 VI To Display the Exact Profit Values for a Given Point On the screen Model Curves on the plot click a point on one of the curves presented For instance by clicking a point on any one of the curves whose value on the abscissa is 25 the exact profit values will appear Profit Detail x 25 Random 0 25 Wizard 1 Validation 0 69797 I To Select a Profit Type 1 Onthe screen Model Curves beneath the plot click the drop down list associated with the Profit field The list of profit types will appear Profit Type Density Bad 2 Select a profit type The corresponding profit curves will appear 1 el t 3 2 Fora Continuous Targ 5 z P D M To Display the Exact Profit Values for a Given Point On the screen Model Graphs on the plot click a point on one of the curves presented 35 592 Wizard 35 592 Validation 35 8049 M To Select the Debriefing Type 1 Onthe screen Model Graphs above the plot click the drop down list associated with the Debriefing Typ
97. Bw 2SHa Variable education Influence on Target 0 05 0 00 0 05 0 10 0 15 Doctorate Prof school Sei idi E Bachelors Assoc acdm Assoc voc l Categories Some college meet EEE 10th 11th 12th 1st 4th 5th 6 m Validation SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 236 When categories do not contain sufficient numbers to provide robust information they are grouped in the KxOther category that is created automatically When a variable is associated with too many missing values the missing values are grouped in the KxMissing category that is also created automatically To understand the value of the categories KxOther and KxMissing consider the following example The database of corporate customers of a business contains the variable web address This variable contains the Web site address of the corporate customers contained in the database Some companies have a Web site others do not In addition each Web site address is unique In this case nfinite nsight automatically transforms the web address variable into a binary variable with two possible values KxOther the firm has a Web site and KxMissing the firm does not have a Web site 6 2 3 5 Clusters Summary The three cluster plots allow you to examine The proportion of observations of the data set contained in each clu
98. Guide cccccccesccesceseeeseeeeeeeeeseeeeeeseeseeeeeeseeeseeeeeaeenes 8 2 2 2 Contact W Sisccts tote sees ace ln ee Oe aaa a uae teva es en cE ee ane 9 3 SAP infinites p E eaa coe oa cet cece cet aaea e ccd a E ec tetee bce eel ys ateneecestuteaseadeyenedetecsscteusudsteteeecears 10 3 1 WMEFOCUICUON Ske ck se dcck sate E E TA tinal dost suet AAE seat teal deat AENEA 10 3 2 Architecture and Operations ccccccccceseeseeeeceecnecseenecsecessecsessecsecsecsessessessessesseessessecsesaessessesaessesaeeaeeaeeaeeaeeas 10 3 2 1 User Interface EE EE TE E A EE 11 3 2 2 Operation Semai adodadadad ii iei ioiei ii ad anand aha aie le dle dene ledealedn eal 12 3 3 Methodological Prerequisites ccecececeeseeceeeeseeeeeeeeeceeceeeeeseeseesneseeseesneseesnesnessnesnesiesiesiesiesiesiesiesiesiesiesieeneeaees 14 3 3 1 What is VOur BUSINESS ISSUC ainoina aoei Eare E E E ad cated 14 3 3 2 Is your Data Usable Tiaia tdia i eidi eaa eat iadro keea ebi nda eeraa haere has 15 4 Essential CONCGptS r oc ae arae a ar cates raa aaa ra raara aa arra a E Aa Aaa maaa aa Aa a Aan A an Daoa ai A Ka Kana ahina naan na A Aann R naiai 16 4 1 Operation of Infinitelnsight OVErVieW c cecceceeceseeeeeeeeeeeeeeeeececsesseesecsessesaessessesaessesesieesaesaessessecsesetaetaeeaeeaes 17 4 2 Data SOUrCES SUPPOKtG drs c2 cael staid ae ee ee eae eee 17 4 3 Data Setoa e A E a ee ees AAEE Aaaa ae a aaea bes hate dan Ad a aaea TE 18 4 3 1 Training DataSets tutru ah a aaia eer
99. I Like any other API it may be used to integrate nfinite nsight with other applications or program packages 3 2 1 4 Control API The Control API App ication Programming Interface is aimed primarily at developers or users with programming experience This Application Programming Interface is used to access the complete range of functionalities and the most fine grained parameterization of nfinite nsighf features In addition it allows customized integration of nfinite nsight features with other applications or program packages Three APIs are provided with nfinite nsight _ACOM DCOM API usable over Microsoft platforms A CORBA API usable over all client server platforms A C API usable over all standalone platforms SAP Infinitelnsight 6 5 SP4 CUSTOMER SAP Infinitelnsight 2013 SAP AG or an SAP affiliate company All rights reserved 11 3 2 2 Operations The operation of nfinite nsight may be subdivided into four phases Phase 1 Data access on page 12 Phase 2 Data manipulation and preparation on page 13 Phase 3 Data modeling on page 13 Phase 4 Model presentation and deployment on page 14 3 2 2 1 Phase 1 Data Access Infinitelnsight accepts many types of data sources Flat files such as csv files files of text tables and other files of type text _ODBC compatible sources such as Oracle SQL Server or IBM DB2 databases In addition the C Data Access Application Programmi
100. Indicator The KR indicator is the robustness indicator of the models generated using nfinite nsight It indicates the capacity of the model to achieve the same performance when it is applied to a new data set exhibiting the same characteristics as the training data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 275 7 1 1 1 92 ROC The ROC Receiver Operating Characteristic graph is derived from signal detection theory It portrays how well a model discriminates in terms of the tradeoff between sensitivity and specificity or in effect between correct and mistaken detection as the detection threshold is varied 7 1 1 1 93 role In data modeling variables page 281 may have three roles They may be Target variables page 279 Explanatory variables page 267 Weight variables page 282 7 1 1 1 94 root Terminological morpheme that is used alone as word root word or as basic element in a derived word 7 1 1 1 95 Rule mode A simplified Classification Regression mode that is used to express the model with rules S 7 1 1 1 96 score The numeric evaluation mark in view of a given problem 7 1 1 1 97 scorecard This screen provides you with the coefficients associated to each category for all variables in the model only in case of a regression model Classification Regression SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG
101. J 4 Note compatible with Excel 2002 2003 XP and 2007 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 228 6 2 3 3 4 Understanding the Model Graphs The following figure represents the model graph produced using the default parameters aly Model Graphs BA ASaas ProfitType Detected H Performance 2 o a D 2 pa D 2 D a oP ph aor Keia or et ar ase oor os oP er gor ee or ase oo er at of a percentage E Random W Wizard Validation ap SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 229 On the plot the curves for each type of model represent the profit that may be realized Y axis that is the percentage of observations that belong to the target variable in relation to the number of observations selected from the entire initial data set X axis On the X axis the observations are sorted in terms of decreasing score that is the decreasing probability that they belong to the target category of the target variable In the application scenario the model curves represent the ratio of prospects likely to respond in a positive manner to your marketing campaign relative to the entire set of prospects contained in your database Detected profit is the default setting for
102. MER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 83 4 Ascreenis displayed as below Set Band Count x Band Count Set the Same Band Count for All Variables 20 4 OK Cancel If you want to Then modify the band count for all the continuous variables of the model 1 Type in the desired band count in the field at the bottom of the panel 2 Click Set the Same Band Count for All Variables Click OK modify the band count for the variable being edited 1 Type in the desired band count in the column Band Count at the top of the panel 2 Click OK SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 84 NS U uu auu eT 5 2 1 2 7 2 Optimal Grouping for All Variables When working with a defined structure if want to keep your categories as they are defined for the model building you must disable this option If not or if you work with no defined structure Enable nfinitelnsight Modeler Data Encoding Optimal Grouping for All Variables allows in a large number of cases to increase the robustness of the model KR with a minimal loss of information Kl Where possible similar adjacent segments are gathered to reduce artifacts between the estimation and validation data sets M To Enable InfiniteInsight Modeler Data Encoding Optimal Grouping
103. P Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 37 4 7 8 How to Obtain a Better Model Obtaining a better model is achieved by Improving the robustness indicator KR of the model or Improving the quality indicator KI of the model or Improving both the KI and KR indicators of the model Several techniques allow you to improve these indicators You can increase the degree of complexity of the model polynomial degree The following table presents other techniques To improve You can The KI indicator of a model Add variables to the training data set Use combinations of explanatory variables that seem relevant to you The KR indicator of a model Add observations to the training data set Note For more information about improving the KI and KR indicators see the sections InfiniteInsight 4 8 Performance Indicators 4 8 1Indicators Specific to Infinitelnsight Two indicators allow you to evaluate the performance of a nfinite nsight model The quality indicator KXEN nformation Indicator known as KI The robustness indicator KXEN Robustness Indicator known as KR 4 8 1 1 Quality Indicator KI 4 8 1 1 1 Definition Kl is the abbreviation for KXEN Information Indicator The KI indicator is the quality indicator of the models generated using nfinite nsight This indicator corresponds to the proportion of informa
104. R Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 48 4 10 3 1 Density Good This curve displays the distribution of model scores for responders signals Performance B cab n sal not apt ne AD ca co eL 40 ad 0 oF DOQQ Ary My My Wo MoS yg Mo tPge oF Al A 49 969 1 aah a ae phar E score m Random W Validation 4 10 3 2 Density Bad This curve displays the distribution of model scores for non responders non signals Performance 2 25 2 00 1 75 1 50 1 25 i 1 00 0 75 0 50 0 25 0 00 A MG NG 8 9H aH HR RMA A CN score E Random Validation SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 49 4 10 3 3 Density All This curve displays both the curves Density Good and Density Bad thus allowing the user to compare both distributions Performance 1 75 1 50 1 25 1 00 Density 0 75 0 50 0 25 49 49 ot oh Ad ah G cat not net ne aD 9 co ee 462 os a oh OD OOF AD Mg MOM Od OM gM Mo Mg Mo Mo Motes OF AF score E Random W ValidationDensity Bad W ValidationDensity Good 4 10 4 Risk Curves 4 10 4 1 Good Bad Odds The X axis represents the risk score and the Y axis represents the odds ratio value The odds ratio is equal to 1 p p p is the probability of risk Performance 450 400 350 300 a 250 D a oO 2
105. Refresh button to see the selected rows 6 2 1 2 4 A Comment about Database Keys For data and performance management purposes the data set to be analyzed must contain a variable that serves as a key variable Two cases should be considered If the initial data set does not contain a key variable a variable index Kx ndex is automatically generated by nfinite nsight features This will correspond to the row number of the processed data If the file contains one or more key variables they are not recognized automatically You must specify them manually in the data description See the procedure To Specify that a Variable is a Key page 72 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 204 6 2 1 2 5 To Specify that a Variable is a Key 1 Inthe Key column click the box corresponding to the row of the key variable 2 Typein the value 1 to define this as a key variable lt KXEN Infinite Insight 1 New Model with Sequence Analysis x Guessed Description 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 7 Add Filter in Data Set 6 2 1 2 6 Defining a Variable Structure There are three ways to define a variable structure by first extracting the categories from the variable statistics then editing or validating the suggested structure by importing the structure from an existing model
106. Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 77 5 2 1 2 6 1 Structure for a Continuous Variable The structure for a continuous variable is defined by several intervals each made of alower bound J that can be either open or closed a minimum value Minimum amaximum value Maximum a higher bound J that can be either open or closed All intervals must be adjoining there can be no gap or overlap between two intervals The option Add Missing allows you to indicate with which interval the missing values should be grouped The option Include Smaller Data allows you to include to the first interval any value smaller than its lower bound In the same way the option nclude Higher Data allows you to include to the last interval any value higher than its higher bound I To Create a New Interval 1 Click the Add button to create a new interval The edit window opens X ase eee x Structure 7 Indude Smaller Data 7 Indude Higher Data mea D Minimum Maximum D Add missing aoe i x ex tr Cancel Select the lower bound type by clicking the J button Enter the minimum value for the interval in the left text field Enter the maximum value for the interval in the right text field Select the higher bound type by clicking the button Check the option Add Missing if the missing values must be grouped with this interval Eml booo Booo a Add Missing Lok
107. SAP affiliate company All rights reserved For instance by selecting 25 of the observations from your entire data set with the help of a perfect model 100 of observations belonging to the target category of the target variable are selected Thus maximum profit is achieved Note These 25 correspond to the proportion of prospects who responded in a positive manner to your marketing campaign during your test phase For these prospects the value of the target variable or profit is equal tol 25 of the observations from your initial data set with the help of the model generated 66 9 of the observations belonging to the target category of the target variable are selected 25 of the initial data set using a random model 25 belonging to the target category of the target variable are selected CUSTOMER 119 5 2 3 5 4 2 Fora Model with a Continuous Target The following graph represents the model curve plot produced using a continuous target ally Model Graphs Bam ASaas Debriefing Type Predicted vs Actual Fl Performance 225 25 0 275 300 325 350 375 400 425 450 Predicted Wizard Validation an The default graphic displays the actual target values as a function of predicted target values Two curves are displayed one for the Validation sub set blue line and another for the hypothetical perfect model Wizard green line The Validation curve gives Actual Target value as a function of Predicted
108. SQL Code ANS SQL Code for MySQL SQL Code for NEOVIEW SQL Code for Oracle SQL Code for SQLServer wraps variable names with SQL Code for SYBASE ASE SQL Code for Sybase IQ SQL Code for Teradata SQL Code for WX2 SQLServer 2000 UDF SQL SQLNetezza Netezza databases SQLTeradata Teradata databases SQLVertica Vertica databases ScoreCard only available for nfinite nsight Modeler Regression Classification models Teradata V2R5 1 UDF UDF Code for MySQL UDF Code for Sybase IQ VB Code SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 181 Note when generating SQL SAS SQL code for MySQL you will be asked to provide the names of the key column and of the data set used 5 2 4 5 2 Advanced Settings The option Activate UNICODE Mode allows you to generate the code selected in Unicode so that it supports non latin languages such as Japanese Russian and so on Note this option is particularly useful for SQL codes tons 2L UDF C The option Do not generate code for non contributive variables allows you to exclude from the code all variables with a contribution of O since they do not influence the result In some cases this can significantly reduce the size of the generated code You can either Use the default separator GO or Use a custom separator SAP Infinitelnsight 6 5 SP4 CUSTOMER
109. Target value For example when the model predicts 35 the average actual value is 37 The Wizard curve is just X Y meaning that all the predicted values are equal to the actual values The graph is an easy way to quickly see model error When the curve is going far from Wizard it means that the predicted value is suspicious The graph is computed as follow about 20 segments or bins of predicted values are built Each of these segments represents roughly 5 of the population for each of these segments some basic statistics are computed on actual value such as the mean of the segment SegmentMean the mean of the associated target TargetMean and the variance of this target within that segment TargetVariance For example for predicted value in 17 19 the mean would be 18 5 the actual target mean would be 20 5 and the actual target variance would be 9 In this case we could say that if the predicted value is between 17 and 19 the model is underestimating a bit the actual value For each curve a dot on the graph corresponds to the segment mean on the X axis and the target mean on the Y axis The blue area represents the expected deviation of the current model The blue area shows where about 70 of the actual values are expected to be In other words it means that in case of a Gaussian distribution about 70 of the actual points should be in the blue area keep in mind that this is a theoretical percentage that may not be ob
110. Variance 7 1 1 1 85 predictive model A model which allows predicting phenomena 7 1 1 1 86 profit type A profit type allows calculation of the profit that may be realized using the model In general a benefit is associated with the positive or expected values of the target variable and a cost is associated with the negative or unexpected values SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 274 7 1 1 1 87 quality indicator KI Kl is the abbreviation for KXEN Information Indicator The KI indicator is the quality indicator of the models generated using nfinite nsight This indicator corresponds to the proportion of information contained in the target variable that the explanatory variables are able to explain R 7 1 1 1 88 random cutting strategy The random cutting strategy distributes the data of the initial data set in a random manner between the three sub sets estimation validation and test 7 1 1 1 89 record The fundamental data structure used for performing data analysis Also called a table row or example A typical record would be the structure that contains all relevant information pertinent to one particular customer or account 7 1 1 1 90 robustness The degree of robustness corresponds to the predictive power of the model applied to an application data set 7 1 1 1 91 robustness indicator KR KR is the abbreviation for KXEN Robustness
111. a Selected SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 66 5 2 1 1 1 Special Case Data Stored in Databases the Explain Mode Before requesting data stored ina Teradata Oracle or SQLServer 2005 database nfinite nsight uses a feature called the Explain mode which categorizes the performances of SQL queries in several classes defined by the user In order to be as fast and as light as possible this categorization is done without actually executing the full SQL query 1 For all versions of Teradata 2 For all versions above and including Oracle 10 The objective is to allow estimating the workload of the SQL query before executing it and then deciding possibly thanks to an IT Corporate Policy if the SQL query can actually be used For example an IT Corporate Policy may favor interactivity and then define 3 classes of SQL queries each with its maximum time Immediate duration lt 1s The query is accepted and executed immediately Batched ls lt duration lt 2s The query is accepted but will be executed on next idle time Rejected 2s lt duration The query will never be executed The number names and limits of classes are defined by the user in order for these values to match the current DBMS configuration and DBMS usage policy If the Explain mode has been configured by your DBMS adminis
112. a Target the user chooses the best number of segments for instance 5 10 which means that 5 to 10 clusters are requested by the user The engine computes the best number of clusters using the metric KI KR For instance you may have 7 clusters For unsupervised segmentation that is to say without Target the nfinite nsight engine chooses the minimum number of clusters for instance 10 10 which means that 10 clusters are requested by the user Note When you activate the option Calculate SQL Expressions InfiniteInsight generates an additional cluster that contains the unassigned records For more details on SQL expressions and unassigned records see Difference Between Standard Cross Statistics and SQL Expressions on page 249 Choosing to Calculate SQL Expressions allows you to see in the model debriefing the SQL Expressions used to generate each cluster M For this Scenario Keep the default settings SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 213 6 2 1 6 1 Setting Up the Advanced Options The panel Specific Parameters of the Model provides you with several options KXEN InfiniteInsight class_Census01 Specific Parameters of the Model IV Calculate Cross Statistics Target Key Settings Target Target Key Extract Variable Categories Distance System Determined z Enc
113. a and which serves to explain a target variable 7 1 1 1 48 Expression Editor Panel allowing to create fields as complex expressions in the Analytical Data set Editor 7 1 1 1 49 extra predictable variable Variable whose values are known for the period that is to be predicted F 7 1 1 1 50 false positive incorrect assignments to the signal class 7 1 1 1 51 fluctuation Evolution of the signal that is not stable neither cyclic nfinite nsight Modeler Time Series G 7 1 1 1 52 GINI index The GINI statistic is a measure of predictive power based on the Lorenz curve It is proportionate to the area between the random line and the Model curve SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 268 7 1 1 1 53 horizon wide MAPE This quality indicator for the forecasting model is the mean of MAPE values observed over all the training horizon A value of zero indicates a perfect model while values above 1 indicate bad quality models A value of 0 09 means that the model takes into account 91 of the signal or in other words the forecasting error model residues is relatively of 9 7 1 1 1 54 In database apply The in database apply is used to apply a model into a database the proper SQL code is generated for the model the resulting code is then executed as a single SQL request in the database This avoids extracting the data from the database and
114. a button yi display the text log detailing the process by clicking the button copy print or save the debriefing panel Vi To Copy the Report Click the BA Copy button The application copies the HTML code of the screen You can paste into a word processing or spreadsheet program a text editor I To Print the Report Lo 1 Click the FA Print button situated under the title A dialog box will appear allowing you to select the printer to use 2 Select the printer to use and set other print properties if need be 3 Click OK The report will be printed Mi To Save the Report Click the hH Save button situated under the title The file is saved in HTML format SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 152 er e i 5 2 4 1 3 Understanding the Deviations Analysis The first step to take to know if there are any deviations in your data is to look at the debriefing report on page 153 and compare the performances KI and KR obtained on the original data with those obtained on the control data set Then to visualize which variables have changed you should look into the Control for Deviations Reports on page 153 The section Control for Deviation Overview provides you with basic statistics on the Data Set used for Deviation Control also called control data set such as the name of the
115. ability for the variable s categories for example cluster ID or age range response rates 7 1 1 1 9 authenticated server Users will be able to communicate to nfinite nsight authenticated server only when providing correct password nfinite nsight authenticated server delegates the authentication to Custom built services or Operating System services through PAM Pluggable Authentication Modules 7 1 1 1 10 autoselection KXEN autoselection is automated attribute selection SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 261 7 1 1 1 11 bin A bin is a range of values defined by its bounds upper bound and lower bound Bins result from a data manipulation activity known as binning Synonym range 7 1 1 1 12 bipartite graph display non bipartite graph display The bipartite graph display shows two distinct populations of nodes or node sets with the links between the two node sets For example the first node set could represent clients and the second products From this global view a non bipartite graph display can be derived to focus on the links between the nodes of a given node set 7 1 1 1 13 bubble chart A bubble chart is a specific graphical representation in nfinite nsight Modeler Segmentation Clustering which displays clusters as bubbles The coordinates of a given bubble are the cluster centroid values according to two selectable continuous variables
116. able becomes the primary variable the plot will display all its information including what it has in common with variable B Variable B with a smaller contribution than A with respect to the target variable becomes the secondary variable only its marginal contribution is displayed on the plot meaning that only the supplementary contribution to target variable information or the values that B does not share with A are displayed This difference of information is noted VARIABLE_B VARIABLE_A 5 2 3 4 5 Encoded Variables Creating an nfinite nsight model uses not only the original variables but also in case of continuous or ordinal variables their value as encoded by nfinite nsight Modeler Data Encoding This is called dual encoding and allows nfinite nsight to find all the information contained in each variable The encoded variables appear on the variable contributions plots with the prefix c_ For example the encoded version of a continuous variable named AGE is noted C_AGE Note in InfiniteInsight Modeler on the Data Description panel if you enable the Natural Encoding for a given variable its K2C encoded value c_variableName will not be generated 5 2 3 5 Category Significance 5 2 3 5 1 Definition The Significance of Categories plot illustrates the relative significance of the different categories of a given variable with respect to the target variable SAP Infinitelnsight 6 5 SP4 CUSTOMER I
117. aeeseaceseaeeseaeeseaeeseaeeseaeeseaeeseaeeseaeeseeeeeeeteas 10 Methodological Prerequisites ccccscesssceseseeeseeceseeeeeeeeseeseaeeseseeseaeeceaeeeeaeeceaeeseaeeseaeeseaeessaeeseaeeseaeeeeaeeseeeeeaeess 14 3 1 Introduction KXEN Knowledge eXtraction ENgines has developed the nfinite nsighf platform in order to provide the ideal Data Mining solution for modeling your data as easily and rapidly as possible while maintaining relevant and readily interpretable results Thanks to nfinite nsight you will transform your data into knowledge in order to make timely strategic and operational decisions Intinitelnsight places the latest Data Mining techniques within reach of any non expert user nfinite nsight allows you to access many data source formats and to generate explanatory and predictive models as well as descriptive models in a semi automated manner extremely rapidly With nfinite nsighf you can concentrate on high value added activities such as analysis of the results of data modeling and decision making 3 2 Architecture and Operations The figure below illustrates the general architecture of nfinite nsight This section provides an introduction to the elements of this architecture like the various types of interfaces that allow you to use nfinite nsight BI Dashboard A 3 Business Objects CRM e cocnos Applications Key Drivers oS TY Real Forecasting KPI s Time Predictive A Patterns Analyt
118. aeoea areia ake 18 4 3 2 Application Data Set nr 18 4 4 Cutting Strategies paninisi e hinari eiiiai eti ei i EE i AER A EE ENA eed 19 4 4 1 DISLALAINE TO AEREE EE A E EEE EE EE E TE ET TEATES AE TEETE E N 19 4 4 2 Roles of the Three SUB SEtS 0kan ona neen one E A ae nate 19 4 4 3 Nine Types of Cutting Strategies oo cc cccccceceeeeeeteceeeneeneceesneenecnesnesnesnesnesiesiesiesieesiesiesiesieeieeieeeeates 19 45 PEETER B E E saves 25 is en AE eet Sik Sick Sok ek Sees He Te ceeding ee Reached ee eee 24 4 5 1 DO TiMILOn acne eas oa seen eves a ae a eee ernest tet 24 4 5 2 Synonyms of Observation and Variable oo eececeeseeseeeeeeeeeeeeeeeeeeeceesneseesnesneseesnesnesiesiesiesieesaeeaees 25 4 5 3 Data Formats de otni aet e a a otek er Seok ob See socket E sone ag 25 4 6 Vala bles E A A A E EE A A T 25 4 6 1 Generne DEFIMITION cass ccsi oleae ea Seca ale ed a a Su aetna 25 4 6 2 EXAMP cnepreteer ter peers perder parca rereerper aa a a a creeper rtrreceerrerrerrerrrererrtr rer re rer 26 4 6 3 Types of Variable Sinne a a dad uankan dunn ai 26 4 6 4 SEO RASCH OMIM AEE T I A A A T 29 4 6 5 Roles of Variables a a ha a A teehee 30 4 7 Mra EEE EAE A E E E S SEEE E E EEE EE EEE TEE EEA 34 4 7 1 Fundamental DStinitio nist es h on eek eH on he E ee 34 4 7 2 Performance Of a Model ceecccecccccesceeseeeeeseeeeecseeeseeeeecsecsaeeseecseesseeeesseeeaecseeesseseseseeesaeeeeseeeeneesaeeees 35 4 7 3 LY PES Of Oo EIE AT ctinien died aise colons ssbb n
119. allows you to identify outlier observations An observation is considered an outlier if the difference between its predicted value and its real value exceeds the value of the prediction range In other words the prediction range is a deviation measure of the values aroud the predicted score The individual contributions by variables contained in the data set with respect to the target variable The sum of all those individual contributions corresponds with the predicted value score to the nearest whole number The decision option can only be used for classification models that is when the target variable is nominal It allows to generate a classification decision based on the scores or predicted values generated by the model The result file obtained contains a column in which a category of the target variable is assigned to every observation The decision is taken on the basis of a threshold that is applied on the scores generated by the model The target category of the target variable is assigned to observations whose scores are superior to the threshold The default threshold computed during the generation or training of the model is chosen so that the way the categories of the target variable are assigned to observations is representative from their distribution in the training data set Upon the level of information desired you can choose to generate among several results files described in the table below Selec
120. ally save the model once it has been generated When the autosave option is activated a green check mark is displayed on the Autosave button SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 93 5 2 1 6 1 Activating the Autosave Option The panel Model Autosave allows you to activate the option that will automatically save the model at the end of the generation process and to set the parameters needed when saving the model M To Activate the Autosave Option 1 In the panel Summary of Modeling Parameters click the Autosave button Autosave Export KxShell Script Advanced a The panel Model Autosave is displayed 2 Check the option Enable Model Autosave KXEN InfiniteInsight Cash_CashFlows _ 0 x Model Autosave IV Enable Model Autosave Description Saved Model Ps Data Type Text Files x Folder 7 Samples x File Table MyModel bet Browse un a 3 Set the parameters listed in the following table 1 Parameter Description Model This field allows you to associate a name with the model This name will then appear in the list of models to be Name offered when you open an existing model Description This field allows you to enter the information you want such as the name of the training data set used the polynomial degree or the KI and KR performance indicato
121. an 0 95 must be considered with caution Applying it to a new data set will incur the risk of generating unreliable results 4 8 1 2 3 Improving the KR of a Model To improve the KR of a model additional observation rows may be added to the training data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 39 4 8 1 3 KI KR and Model Curves On the model curve plot Of the estimation data set default plot the KI indicator corresponds to the area found between the curve of the model generated and that of the random model divided by the area found between the curve of the perfect model and that of the random model As the curve of the generated model approaches the curve of the perfect model the value of KI approaches 1 Ofthe estimation validation and test data sets select the corresponding option from the list Data set located below the plot the KR indicator corresponds to one minus the area found between the curve of the estimation data set and that of the validation data set divided by the area found between the curve of the perfect model and that of the random model 4 8 1 4 Advanced Users KI for Continuous Targets 1 Working with the Validation data set use a uniform encoding based on the distribution curve to map the target values into the range 1 1 The curve is different for each sub data set You can access to this curve through the
122. an 50 000 0 if the individual has a salary of less than 50 000 Note In order to avoid complicating the nfinitelnsight Modeler Regression Classification and InfiniteInsight Modeler Segmentation Clustering application scenarios the variable fniwgt is used as a regular explanatory variable in these scenarios and not as a weight variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 194 6 1 7 Infinitelnsight To accomplish the scenario you will use the Java based graphical interface of nfinite nsight This interface allows you to select the nfinite nsight feature with which you will work and help you at all stages of the modeling process M To Start InfiniteInsight 1 Select Start gt Programs gt KXEN nfinite nsight gt Infinitelnsight Infinitelnsight screen will appear KXEN InfiniteInsight X InfiniteInsight Version 6 1 0 Explorer Modeler Create or Edit Explorer Objects Create a Classification Regression Model Create a Data Manipulation Create a Clustering Model Load an Existing Data Manipulation Create a Time Series Analysis Perform an Event Log Aggregation Create Association Rules Perform a Sequence Analysis Load a Model Perform a Text Analysis Wea ang Social A Toolkit Create a Social Network Analysis Open the Data Viewer Load a Social Netw
123. and automatically the option Add Score Deviation will be selected as well K KXEN InfiniteInsight class_kxenodbc Po Applying the Model Application Data Set Data Type ata Base ea SeS _ Defne Mapping _ mGeneration Options Generate Predicted Value Only yl Advanced Apply Settings Mode apply zl IV Use direct apply in the database Add Score Deviation Results Generated by the Model Data Type Data Base yy Folder kxenDemo eal i Browse 5 2 4 2 5 Advanced Apply Settings 5 2 4 2 5 1 1 Copy the Weight Variable This option allows you to add to the output file the weight variable if it had been set during the variable selection of the model SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 160 E a MUOU Ty 5 2 4 2 5 1 2 Copy Data Set Id This option allows you to add to the output file the name of the sub data set the record comes from Estimation Validation or Test Warning This option cannot be used with the in database apply feature 5 2 4 2 5 1 3 Copy the Variables This option allows you to add to the output file one or more variables from the data set MV To Add All the Variables Check the All option To Select only Specific Variables Check the Individual option Click the gt gt button to display the variable selecti
124. aning that its values are character strings They are therefore ordered according to alphabetic conventions 4 6 3 2 2 Example The variable school grade is an ordinal variable Its values actually belong to definite categories and can be sorted This variable can be numerical if its values range between O and 20 textual if its values are A B C D E et F Important A variable assessment which values are good average and bad cannot be directly treated as an ordinal variable by nfinitelnsight features The values would be sorted in alphabetical order average bad good and not according to their meaning When a nominal variable order is important the variable must be encoded in letters or numbers before it can be used by nfinite nsight 4 6 3 3 Nominal Variables 4 6 3 3 1 Definition Nominal variables are variables whose values are discrete that is belong to categories and are not sortable Nominal variables may be Numerical meaning that its values are numbers Textual meaning that its values are character strings Important Binary variables are considered nominal variables SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 27 4 6 3 3 2 Example The variable zip code is a nominal variable The set of values that this variable may assume 10111 20500 90210 for example are clearly distinct non ran
125. another specified event occurs first 7 1 1 1 117 time stamped population A time stamped population is a list of pairs lt identifiers time stamps gt the semantic meaning of such a construct can be associated with snapshots of the entities and a given time in general terms a given entity may be represented at different time stamps in a single time stamped population 7 1 1 1 118 training Another term for estimating a model s parameters based on the data set at hand 7 1 1 1 119 training data set A training data set is a data set used for generating a model By analyzing the training data set Infinitelnsight features will generate a model that allows explanation of the target variable based on the explanatory variables 7 1 1 1 120 transaction A transaction is defined by a unique key the key of the related session an attribute called an item 7 1 1 1 121 true negative correct assignments to the class of non signals SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 280 7 1 1 1 122true positive correctly identified signal U 7 1 1 1 123 unassigned record When creating clusters with SQL expressions unassigned records are the observations that cannot be described by the SQL expressions and are left outside the cluster 7 1 1 1 124upper bound An upper bound of a subset S of some partially ordered set P lt is an element of P which is
126. ariables Click the link indicating the number of variables in the sentence Each step removes 1 variable A slide is displayed ranging from 1 to the total number of variables in the model Move the cursor on the slide to select the number of your choice Click the OK button To Select the Information Amount Click the link indicating the amount of information to keep in the sentence Lach step keeps 95 0 of information A slide is displayed Move the cursor on the slide to select the quantity of your choice Click the OK button To Set the Authorized Quality Loss The quality loss can be set in the sentence Search process stops with a drop of 1 0 ofKI and KR A OND Click the link indicating the percentage of loss for example 5 0 A slide is displayed Select the maximum percentage of authorized quality loss with the cursor Click OK Click the quality criterion A drop down list is displayed offering the following options Based on KI 2 KR the quality loss is based on both indicators KI information contained in the model and two KR robustness KI and the KR the quality loss is limited for both indicators KI and KR It is the default value KI the quality loss is limited for the KI only KR the quality loss is limited for the KR only Select the option of your choice Click the OK button SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP
127. ariables in the model The tool bar located under the title allows the user to copy the coordinates to the clipboard print the plot or save it in PNG format The values are normalized and their sum always equals to 0 Depending on the chosen profit strategy or on the continuous target variables value type you can obtain all positive importances or negative and positive importances The X axis shows the influence of the variable categories on the target The significance of the different numbers on the X axis are detailed in the following table Number on the X axis Indicates that the category has positive number a positive influence on the target O no influence on the target the behavior is the same as the average behavior of the whole population negative number a negative influence on the target The Y axis displays the variable categories Categories sharing the same effect on the target variable are grouped They appear as follow CATEGORY_A CATEGORY_B CATEGORY_C Categories not containing sufficient numbers to provide robust information are grouped in the KXOTHER category When a variable is associated with too many missing values the missing values are grouped in the KXMISSING category Both categories are created automatically by nfinite nsight SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 129 The
128. as variable encodings missing value replacement compressions model parameters VI To Generate the Code Corresponding to the Model 1 Inthe list Target to be used choose the target of model 2 Use the list Information to be Generated to select the type of results Selected Option Score Estimates Probability Bar Results of the Generated Model score value classification or estimates regression score value and probability value except for HTML and all SQL codes for which only the probability value is provided score value and error bar value except for HTML and all SQL codes for which only the error bar value is provided Warning Both options Probability and Bar are only available for nfinitelnsight Modeler Regression Classification models with nominal targets Note in the case of a continuous variable the generated code SQL for example always includes a number of categories that is higher than in the user defined structure or as given by the parameter band count if no user has structure has been set Indeed the encoding of variables adds target curve points to increase the accuracy of coding according to the training data set These points split some existing categories and thus increase the number of categories in the generated code 3 Inthe section Code Settings select the code type to be generated List of Generated Codes on page 181 4 Click the Browse button associated with the Folder field and selec
129. ase completed in 1994 by Barry Becker Note For more information about the American Census Bureau see http www census gov This file presents the data on 48 842 individual Americans of at least 17 years of age Each individual is characterized by 15 data items These data or variables are described in the following table Variable Description Example of Values age Age of individuals Any numerical value greater than 17 workclass Employer category of individuals Private Self employed not inc fniwgt Weight variable allowing each individual Any numerical value such as 0 2341 or 205019 to represent a certain percentage of the population education Level of study represented by a schooling 11th level or by the title of the degree earned Bachelors education nu Number of years of study represented by A numerical value between 1 and 16 m a numerical value marital status Marital status Divorced Never married occupation Job classification Sales Handlers cleaners relationship Position in family Husband Wife race Ethnicity White Black Sex Gender Male Female capital gain Annual capital gains Any numerical value capital loss Annual capital losses Any numerical value native Country of origin United States country France class Variable indicating whether or not the 1 if the individual has a salary of greater than 50 000 salary of the individual is greater or less th
130. asets in a curve chart or the Validation Data Set in a bar chart SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 233 6 2 3 4 4 Understanding the Plots of Variables E For this Scenario Select the variable marital status which is the explanatory variable that contributes the most to the target variable Class all Category Significance BAG Ba ASUA Variables martalstaus S Variable marital status Influence on Target a gi sai ar we e i y N s p j p Categories m Validation ihe This plot presents the effect of the categories of the marital status variable on the target variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 234 6 2 3 4 5 Variable Categories and Profit The plot Category Significance illustrates the relative significance of the different categories of a given variable with respect to the target variable On this type of plot The higher on the screen one finds a category the greater the positive effect on the target category or hoped for value of the target variable In other words the higher a category appears on the screen the more representative that category is of the target category of the target variable The width and direction of the bar
131. asts and indicates how much the forecasts differ from the real signal value 7 1 1 1 68 mean square error L2 square root of the mean of the quadratic errors Euclidian distance or root mean squared error RMSE 7 1 1 1 69 metadata the information about the data itself SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 271 7 1 1 1 70 meta operator Operators that are used upon other operators 7 1 1 1 71 missing value Data values can be missing because they were not measured not answered were unknown or were lost 7 1 1 1 72 monotonicity The direction of variation of a monotonic function does not change 7 1 1 1 73 multiple instance installation A KXEN installation mode that consists in running several instances on one server in order to divide up the load N 7 1 1 1 74 nominal variable Nominal variables are variables whose values are discrete that is belong to categories and are not sortable Nominal variables may be Numerical meaning that its values are numbers Textual meaning that its values are character strings Important binary variables are considered nominal variables 7 1 1 1 75 normalize To transform numerical data and make them fir into a defined interval SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 272 7 1 1 1 76 numeric filter A digital filter i
132. aving another bar orientation as the default one for a specific report item Sort by Sort Order you can select a column to sort by and choose between an ascending or a descending order Visibility you can hide columns of a report item or even menu items At least one column of a ri lV To Apply the New Style Sheet to the Generated Reports 4 Inthe panel Report select the new style sheet 5 Click OK A window opens indicating that you have to restart the modeling assistant to take the edited options into account 6 Click OK When training a model all the generated reports the learn excel statistical reports are now customized SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 63 5 1 8 1 2 Defining a metadata repository The metadata repository allows you to specify the location where the metadata should be stored MVI To define a metadata repository 1 Choose between storing the metadata in the same place as the data or in a single place by checking the option of your choice 2 Inthe list Data Type select the type of data you want to access For some type of data you will need a specific license 3 Use the Browse button corresponding to the Folder field to select the folder or database containing the data In case of a protected database you will need to enter the user name and the password in the fields User and Password 4 Cl
133. been sorted in descending order KXEN InfiniteInsight class_Census01 EA Clusters Summary ASdAaAsms O Frequencies JV Descending Sort Frequencies Data Set Estimation a 4 a a 4 A 4 a b Clusters Frequencies Among the five clusters Cluster 4 is the one which contains the greatest number of observations or 18 of the total number of customers contained in the entire data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 240 ee ery 6 2 3 5 2 3 The Plot Relative Target Means Similar to the Target Means plot the Relative Target Means plot presents the proportion of observations for each cluster belonging to the target category of the target variable The only difference between the two plots is the scale used on the Y axis On the Relative Target Means plot the proportion of observations belonging to the target category of the target variable relative to the entire data set is re expressed In other words the O value of the Y axis corresponds to the true percentage of observations belonging to the target category of the target variable in relation to the entire data set The figure below presents the Relative Target Means plot obtained during this scenario The bars have been sorted in descending order KXEN InfiniteInsight class_Census01 ol x aly Clusters Summa
134. below The key variable defined during data description at the setting model parameters step Possibly the target variable given as known values if the latter appeared in the application data set as is the case in this scenario The predicted value score provided by the model for the target variable of each observation The name of this column corresponds to the name of the target variable prefixed by rr_ or in this case rr_Class The decision is based on the score For example its value can be of 1 if the observation is considered as interesting or O if it is considered as uninteresting for the model The name of this column corresponds to the name of the target variable prefixed by decision_rr_ or in this case decision_rr_class he probability decision is also based on the score and provides the probability of the decision The higher it is the more it will confirm the decision value The name of this column corresponds to the name of the target variable prefixed by proba_decision_rr_ or in this case proba_decision_rr_class The probability for each observation that it does or does not belong to the target category of the target variable The name of this column corresponds to the name of the target variable prefixed by proba_rr_ or in this case proba_rr_class The prediction range or maximum error The name of this column corresponds to the name of the target variable prefixed by bar_rr_ or in this case ba
135. ble of contents and the index will be valuable tools helping you find the information that you seek CUSTOMER 2013 SAP AG or an SAP affiliate company All rights reserved 5 1 3 Conventions Used in this Document To facilitate reading certain publishing conventions are applied throughout this guide These are presented in the following table The following information items Graphical interface features and file names The titles of particularly useful sections The titles of procedures The titles of sections specific to the scenario presented in this guide SAP Infinitelnsight 6 5 SP4 How to Use this Document Are presented using Arial bold Garamond italicized bold E For example Click Next See Operations I To Select the Target Variable E For this Scenario CUSTOMER 2013 SAP AG or an SAP affiliate company All rights reserved 6 2 Welcome to this Guide IN THIS CHAPTER About THISIDOCUIMENT isiin an e iaaa A E AAE Ea E E land a EAE o TEE EEE dae EENE OREA 7 B lore BEoIMNIN ker eese aaO a E EEE E EDA aE EEEE A ASCE e EEDS 8 2 1 About this Document 2 1 1 Who Should Read this Document This document is addressed to people who want to evaluate or use nfinitelnsight 2 1 2 Prerequisites for Use of this Document Use of this guide does not require any prior expertise in statistics or databases ntinitelnsight features are developed using cutting edge technologies and w
136. button SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 216 6 2 2 Step 2 Generating and Validating the Model Once the modeling parameters are defined you can generate the model Then you must validate its performance using the quality indicator KI and the robustness indicator KR f the model is sufficiently powerful you can analyze the responses that it provides in relation to your business issue see Step 3 Analyzing and Understanding the Model Generated page 109 and then apply it to new data sets see Step 4 Using the Model page 150 Otherwise you can modify the modeling parameters in such a way that they are better suited to your data set and your business issue and then generate new more powerful models 6 2 2 1 Generating the Model M To Generate the Model 1 Onthe screen Specific Parameters of the Model click the Generate button The screen Training the Model will appear The model is being generated A progress bar will allow you to follow the process Training the Model Computing statistics NE 220 rma a Stop Current Task SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 217 2 If the Autosave option has been activated in the panel Summary of Modeling Parameters a warni
137. by examining the performance indicators The quality indicator KI allows you to evaluate the explanatory power of the model that is its capacity to explain the target variable when applied to the training data set A perfect model would possess a KI equal to 1 and a completely random model would possess a KI equal to O The robustness indicator KR defines the degree of robustness of the model that is its capacity to achieve the same explanatory power when applied to a new data set In other words the degree of robustness corresponds to the predictive power of the model applied to an application data set To see how the KI and KR indicators are calculated see KI KR and Model Curves on page 40 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 219 Beside K and KR the nfinite nsight also provides you with two commonly known indicators the Classification rate in case of a classification model the Pearson Square Correlation coefficient named R2 in nfinite nsight in case of a regression model Both indicators can be used to compare nfinite nsight results with results obtained through other data mining tools Note Validation of the model is a critically important phase in the overall process of Data Mining Always be sure to assign significant importance to the values obtained for the KI and KR of a model
138. c_census csv These files allow you to evaluate nfinite nsight features and take your first steps in using it CensusOl csv is the sample data file that you will use to follow the scenarios of nfinite nsight Modeler Regression Classitication and Infinitelnsight Modeler Segmentation Clustering This file is an excerpt from the American Census Bureau database completed in 1994 by Barry Becker Note For more information about the American Census Bureau see http www census gov This file presents the data on 48 842 individual Americans of at least 17 years of age Each individual is characterized by 15 data items These data or variables are described in the following table Variable Description Example of Values age Age of individuals Any numerical value greater than 17 workclass Employer category of individuals Private Self employed not inc fniwgt Weight variable allowing each individual Any numerical value such as 0 2341 or 205019 to represent a certain percentage of the population education Level of study represented by a schooling 11th level or by the title of the degree earned Bachelors education nu Number of years of study represented by A numerical value between 1 and 16 m a numerical value marital status Marital status Divorced Never married occupation Job classification Sales Handlers cleaners relationship Position in family Husband Wife race Ethnicity White
139. cally from scenario 1 In scenario 1 using the nfinite nsight Modeler Regression Classification formerly known as K2R feature you managed to accomplish all the objectives of your first marketing campaign meeting the deadlines and within the budget you were allowed In order to customize the marketing messages from the bank and improve communication with the various customers and prospects for this new product the senior management of the bank now asks you to build a segmentation model of the customers of this product Using the nfinite nsight Modeler Segmentation Clustering formerly known as K2S feature you can rapidly develop a descriptive model with the least possible cost This model shows the characteristic profiles of the customers interested in your new product and thus responds to your business issue and fulfills your objectives SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 190 6 1 2 Your Objective Consider the following case Using nfinitelnsight Modeler Regression Classification you have contacted the prospects most likely to be interested in your new financial product and identified the ideal number of prospects to contact out of the entire database meeting the deadlines and within the budget you were allowed see the nfinite nsight Modeler Regression Classification User Guide To improve the rat
140. clusters with an excellent response rate and those with a poor response rate In addition if your customer database contains customer expenditures on your other products you will also obtain information on product sale synergies by cluster Using nfinitelnsight Modeler Segmentation Clustering you have access to all the analytical features needed to define the type of message to be sent to the cluster for each customer You have homogeneous clusters that will allow you to respond to your business issue Of particular importance this segmentation is systematic the results obtained do not represent a particular point of view of your data and is robust or consistent Two people performing this segmentation using the KXEN method would obtain the same results SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 193 e IM rs 6 1 6Introduction to Sample Files This guide is accompanied by the following sample data files A data file CensusOl csv The corresponding description file desc_census csv These files allow you to evaluate nfinite nsight features and take your first steps in using it CensusOl csv is the sample data file that you will use to follow the scenarios of nfinite nsight Modeler Regression Classitication and Infinitelnsight Modeler Segmentation Clustering This file is an excerpt from the American Census Bureau datab
141. cted negative Actual Positive Observations True Non target Number of actual negative Number of correctly predicted negative observations that have been observations Category predicted positive Actual Negative Observations By default the Total Population is the number of records in the Validation data set You can modify this number to see the confusion matrix for the population on which you want to apply your model The Metrics The Classification Rate is the percentage of data accurately classified by the model when applied on the training data set The Sensitivity is the percentage of actual positives which are correctly identified as such The Specificity is the percentage of negatives which are correctly identified as such The Precision is the percentage to which repeated measurements under unchanged conditions show the same results The Score indicates how sensitively a likelihood function depends on its parameter The likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values 5 2 3 9 3 Understanding the Cost Matrix This section allows you to visualize your profit depending on the selected score or to automatically select the score depending on your profit parameters For each observation category enter a profit or a cost per observation The total profit is automatically displayed on the right of the table To know
142. d quickly and easily SAP Infinitelnsight 6 5 SP4 CUSTOMER How to Use this Document 2013 SAP AG or an SAP affiliate company All rights reserved 4 1 2 Which Sections should you Read Depending on your job profile and your needs you may choose to read the entire guide or only certain sections In either case it is essential that you read the section concerning the KXEN performance indicators on page 38 These indicators embody one of the most important concepts of nfinite nsight they allow evaluation of the quality and robustness of the models generated The following table provides some points of reference to facilitate your use of this guide What is your Profile You want to evaluate nfinitelnsight and your time is tightly budgeted You want to be guided step by step through nfinite nsight You have had only limited hands on experience in data modeling You have had significant experience in data modeling You have previously taken a Infinitel nsight training seminar You are already a nfinite nsight user SAP Infinitelnsight 6 5 SP4 How to Use this Document Best Use of this Guide You could restrict yourself to 1 Reading the scenario of the feature that interests you or at least the summary of that scenario Application Scenario Enhance Efficiency and Master your Budget using Modeling Application Scenario Customize your Communications using Data Modeling 2 Going directly to the
143. d select the folder where the description file is located with the Browse button Note The folder selected by default is the same as the one you selected on the screen Data to be Modeled 4 Inthe Description field select the file containing the data set description with the Browse button Warning When the space used for model training contains a physical variable named Kx Index it is not possible to use a description file without any key for the described space When the space used for model training does not contain a physical variable named Kx Index it is not possible to use a description file including a description about a KxIlndex variable since it does not exist in current space SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 68 5 Click the OK button The window Load a Description closes and the description is displayed on the screen Data Description KXEN InfiniteInsight New Regression Classification Model fe Description Desc_Census0O1 csv workclass fniwgt feducation education num marital status occupation relationship 1 2 3 4 5 6 7 8 9hr EXA SPSS S SSS S Sfseyseys syoysojs sex capital gain capitaltoss 13 hours per week 14 native country 15 dass 16 KxIndex linteger FTSS S SS SlSoe Sfsoyoyey
144. d option 3 Click the OK button M ToSelect the Number of Variables This criterion is mandatory and allows fixing the minimum and the maximum number of variables in the final model 1 Inthe sentence defining the number of variables Se ectthe best model keeping between 1 andall variables select the minimum number of variables for example 1 variable and the maximum number of variables for example all variables 2 For the minimum number of variables a slide is displayed ranging from 1 to the total number of variables in the model Move the cursor on the slide to select the quantity of your choice For the maximum number of variables you can either confirm the minimum number selected previously by choosing Keep all variables or choose a maximum number of variables 3 Click the OK button 5 2 1 7 2 1 2 Choosing the Stopping Criteria You can choose between two variables selection parameters Each step removes1 variable This option allows you to set the number of variables that should be excluded at each iteration Each step keeps 95 0 of information SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 100 This option allows you to set the amount of information that should be kept at each iteration thus limiting the loss of information Select the desired option l 4 3 m To Select the Number of V
145. database a flat file database or even a text file and the connection information necessary for accessing the data 7 1 1 1 33 database A database is a structured collection of records or data that is stored in a computer system 7 1 1 1 34 descriptive model A model which allows describing data sets SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 265 7 1 1 1 35 detected profit Detected profit is the profit type shown as the default It allows examination of the percentage of observations belonging to the target category of the target variable that is the least frequent category as a function of the proportion of observations selected from the entire data set 7 1 1 1 36 determination coefficient R2 ratio between the variability sum of squares of the prediction and the variability sum of squares of the data 7 1 1 1 37 deviation Deviation is a measure of difference for interval and ratio variables between the observed value and the mean 7 1 1 1 38 domain See Analytical Record The behavioral domain is usually obtained through aggregates per entity on transactional tables E 7 1 1 1 39 encoding Encoding is the process of putting a sequence of characters letters numbers punctuation and certain symbols into a specialized format for efficient transmission or storage 7 1 1 1 40 engine The UI and view independent portion of an application c
146. database or on new data which is not usually trivial with other techniques The best way to understand the difference between centroid based clusters and rule based clusters is to use graphs Diagram Explanation This diagram represents a set of observations from a data set To create clusters nfinite nsight Modeler Segmentation Clustering engine uses the centroid approach Centroids are the results of a clustering algorithm meaning they are the barycenter of the points closest to them When applying nfinite nsight Modeler Segmentation Clustering on this data set the observations are grouped 7 depending on their distance with each centroid This graph represents the previous data set observations grouped into four clusters This diagram is known as a Vorono diagram To create the SQL expressions that define the clusters nfinitelnsigh Modeler Segmentation Clustering engine uses what is called Minimum Description Length MDL It means that after creating the initial clusters from the centroid approach then they are reshaped cut to fit into the smallest possible expression thus trying to find the best compromise between length of the expression and the lost of information This graph represents the SQL expressions of the clusters in red compared with the centroids You can see on this graph that some observations that were in a cluster when using the centroid approach end up in another when using the
147. default It allows you to generate in the output file the value predicted by the model for the target variable It appears in the output file as RR_ lt TARGET VARIABLE gt 5 2 4 2 5 3 2 2 Outlier Indicator This option allows you to show in the output file which observations are outliers An observation is considered an outlier if the difference between its predicted value and its real value exceeds the value of the error bar In other words the error bar is a deviation measure of the values around the predicted score It appears in the output file as OUTLIER_RR_ lt TARGET VARIABLE gt Possible values are 1 if the observation is an outlier with respect to the current target else O SAP Infinitelnsight 6 5 SP4 Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved CUSTOMER 164 ee eee a A TT 5 2 4 2 5 3 2 3 Predicted Value Quantile This option allows you to cut the output file in quantiles and to assign to each observation the number of the quantile containing it Approximate quantiles are constructed based on the sorted distribution and the boundaries of predicted scores from the validation sample The score boundaries are used to determine approximate quantiles on the apply data set Notes Exact quantile computation would require a full sort of the scores obtained on the apply data set which can be consuming From version V6 0 I nfin
148. dex Name Storage vae Key Order Missng Group Desorption stuctwe 7 Add Filter in Data Set pp Analyze LD open Description R Save Description y view Data Bo Bioinatmi tesa edecein te GEIR 6 Goto section Describing the Data Selected SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 201 SSS O SSS NNN eee 6 2 1 2 Describing the Data Selected For this scenario Select Text Files as the file type Use the file Desc_CensusOl csv as the description file for the CensusOl cvs data file I To Select a Description File 1 On the screen Data Description click the button Open Description The following window opens Load a Description for Census01 csv xi ooh tyre fans Folder Samples Census Z A Browse Desmit S a Bronse ox cae In the window Load a Description select the type of your description file In the Folder field select the folder where the description file is located with the Browse button Note The folder selected by default is the same as the one you selected on the screen Data to be Modeled In the Description field select the file containing the data set description with the Browse button Warning When the space used for model training contains a physical variable named Kx Index it is not possible to use a description file wi
149. displayed categories is lower than that defined in the user structure by the parameter band count if no user structure has been defined For more information about using this parameter refer to the section Optimal Grouping for all Variables the Model Performance in which you will find the model performance indicators KI and KR the variables contributions and the score detailed statistics the Control for Deviations which allows you to check the deviations for each variable and each variable category between the validation and test data sets the Expert Debriefing in which you will find more specialized performance indicators as well as the variables encoding the excluded variables during model generation and the reason for exclusion and soon SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 133 5 2 3 7 1 Variables Exclusion Cause Statistical Reports include the section Variables Exclusion Causes For regression and classification models this sections presents the reason why a variable got excluded from the model if any variable was It is divided in two parts Overall Exclusions showing the variables excluded from the whole model lt KXEN InfiniteInsight class_census_apply2 ally Statistical Reports ee Hy EJ E E Er EJ S E Variables E Category Fre ae K2R Engine
150. dividuals stand a good chance of becoming your customers some day But optimizing your campaign means being able to identify those prospects that have every chance of becoming customers today as a result of the current marketing campaign Discover new niche prospects that all your knowledge of the market had not previously allowed you to identify Select a predefined number of individuals Imagine that one of the constraints of your campaign consisted of contacting exactly 5 000 prospects Your intuition may help you to select 2 400 of these But how are you going to identify the remaining 2 600 prospects to be contacted A purely random selection thus completely non optimized might be your only solution 5 1 6 3 Classical Statistical Method You may decide to use a classical statistical method to better manage the effectiveness of your campaign and thus your budget On the basis of the information that you have a Data Mining expert could create predictive models In other words you could ask a statistical expert to create a mathematical model that would allow you to predict the probability of a given individual to respond to your marketing campaign as a function of his profile To implement this method the statistician must Perform a detailed analysis of your test campaign Prepare your database down to the smallest detail specifically encoding the different types of data in such a way that they can be used by the ana
151. e field The list of debriefing types will appear Debriefing Type Predicted vs Actual z Predicted vs Actual Actual vs Predicted 2 Select a debriefing type The corresponding plot will appear SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 117 5 2 3 3 4 Understanding Model Graphs 5 2 5 5 4 1 Fora Model with a Nominal Target The following figure represents the model graph produced using the default parameters aly Model Graphs BA ASaas proft Type er E Performance E 2 a 3 D 2 3 D 2 D a oP ph aor ner or ee ar ape oor so sor er gor en or ae oo er at of go percentage m Random W Wizard W Validation SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 118 On the plot the curves for each type of model represent the profit that may be realized Y axis that is the percentage of observations that belong to the target variable in relation to the number of observations selected from the entire initial data set X axis On the X axis the observations are sorted in terms of decreasing score that is the decreasing probability that they belong to the target category of the target variable Inthe application scenario the model curves represent the ratio of prospects li
152. e Model eeceecesceseseeeeeeeeeeeeeeeeseesesesesesetetetieseeeeeas 105 5 23 Step 3 Analyzing and Understanding the Model Generated cccceceeceeeeeeeeeeeeeeeeeeeeeneeeeaeeaeens 109 5 2 4 step 4 Using the Model krin nenene a e CR aA eee ed eee 150 6 Infinitelnsight Modeler Segmentation ClUSter ing scseeccesseeseeeeseeeeseeeeseneeseeesnneeeseeeenenensneeenes 190 6 1 Application Scenario Customize your Communications using Data Modeling eeeeeeeeeeeteeneeteeneeees 190 6 1 1 Pr esentatloniznnicsets eke nel dade a lil da aha aan Ada aan Ana Queens 190 6 1 2 MOUN OD CCTIV Cieza cess es lt Secs acs scsucossvatnacesensscacsssacood vatbacsbexseasb sesacdacdaccdecatescduccinsaseacdascdeaateacsucatncipeat ucade 191 6 1 3 YOUCApprodt heri re acoso edadodws dad ahud ad addin ahd nee eileen a dec lac Ea TS 191 6 1 4 MOURIBUSINGSS SSU Cx A 2A oe cenit a senso ecetee eens teense ceeran sna A A E eaten A 191 6 1 5 MOUF SOIUTIONS nin il erased eh eben aaa aa a da aeaa eaiies 192 6 1 6 Introduction te Sample Files sei inaen ua khanna aed akase i aaie enih 194 6 1 7 MfnteNSERNt E koeien aE E AS EEE ESATE NERE EUROAN ANORA nnn ee 195 6 2 Creating a Clustering Model Using Infinitelnsight MOCEelEDr cc eccceceeeeeeeeeeeeteeeeeneeneeeeeneeneeeeeeeeeeeeeeaeens 199 6 2 1 Step 1 Defining the Modeling Parameters c ecceceeceeseeeeeeeeeeeeeceeseeseeeeeseesneseesnesnesnesiesiesieeeaeeaeeas 199 6 2 2 Step 2 Generating and Validati
153. e Polynomial degree field enter the value corresponding to the degree of complexity of the model that you want to obtain Polynomial Degree 1 a SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 95 This option allows you to define the numbers of bins to create for the score This value must be set between 20 and 100 since a lower or higher number of bins would lead to poor model quality Score Bins Count 24 This option allows you to enable the exclusion of variables based on the value of their KR KXEN uses an internally computed threshold to decide whether a variable has a low KR This threshold depends mainly on the data set size and target distribution In versions prior to 6 1 0 Infinitelnsight automatically excluded variables with low KR since version 6 1 0 this behavior has been disabled by default If you do not enable this feature no variable will be excluded based on its KR value MI To Automatically Exclude Variables with Low KR 1 Check the option Exclusion of Low KR Variables Score Bins Count 20 Ke i isa of Low KR Variables Disabled Enabled Correlations Setings SC C lt lt lt lt SCOt Higher than 0 50 H Keep all Correlations Keep the First 1 024 2 Check the radio button Enabled Selecting a Weight Variable enables to set
154. e Target SEBES 6 Profit Curve Detecte Target class Selected Sub population Whole Population Estimation Validation 36381 12461 ATAA ANTA inii SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 143 5 2 3 10 2Understanding the Decision Tree The panel Decision Tree is split into three parts 1 the decision tree itself which is displayed in the upper section of the panel 2 two tabs located in the left bottom part of the panel provide you with information on the nodes and with the profit curve corresponding to the current decision tree 3 anavigator allowing you to visualize what part of the tree you are studying is displayed in the right bottom part of the panel EA Decision Tree Whole Population Population 48842 Positive Target E 22 marital status marital status marital status Never married Separ Divorced Married sp Married AF spouse M Population 17647 Population 8779 Population 22416 Positive Target 4 71 Positive Target I 9 76 Positive Target 440 Node Details profit Curve Detected Target class R Selected Sub population Whole Population Estimation Validation Population Count 36381 12461 Positive Target GESE ANTA SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate co
155. e estimation sub set which represents the best compromise between perfect quality and perfect robustness Test Verify the performance of the selected model on a new data set To understand the role of cutting strategies in the model generation process see the figure Generating a Model page 35 4 4 3 Nine Types of Cutting Strategies To generate your models there are eight cutting strategies that you may use A customized cutting strategy Eight automatic cutting strategies SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 19 4 4 3 1 Customized Cutting Strategy 4 4 3 1 1 Definition The customized cutting strategy allows you to define your own data sub sets To use this strategy you must have prepared before opening nfinite nsight features three sub sets the estimation validation and test sub sets 4 4 3 1 2 How to Use this Before opening the nfinite nsight cut your initial data file into three files of the size of your choice For example The first file may contain the first 1 500 observations or lines of your initial data file The second file observations 1 501 to 3 000 The third file observations 3 001 to 5 000 Warning The customized cutting strategy is risky in the instance of an initial data file in which the data have been sorted In this case the first lines will not be representative of the overall set of data c
156. e of observations belonging to the target category of the target variable contained in each cluster Depending upon the level of information desired you can choose to generate Only the cluster index to which each observation belongs Predicted Value Only option The cluster index and the disjunctive encoding of the cluster indexes Cluster Id Disjunctive Coding option You can also decide to include in the results file all input variables of the application data set Cluster Id Disjunctive Coding copy dataset option The cluster index and the target mean for each cluster Cluster Id Target Mean option B For this Scenario You will apply the model to the file CensusOl1 csv that you used previously to generate the model In the procedure To Apply the Model to a New Data Set Select the format Text files Inthe Generate field select the option Cluster Id Target Mean Select the folder of your choice in which to save the results file Results generated by the model 6 2 4 1 5 Analyzing the Results of the Application B For this Scenario Open the results text file in Microsoft Excel generated when you applied the model to the CensusOl1 csv file Vi To Open the Model Application Results File 1 Depending upon the format of the results file generated use Microsoft Excel or another application to open the file The figure below presents the headings and columns of the results file obtained for this scenario
157. e of return of your campaign senior management asks you to Build a segmentation model of your customers Analyze the characteristics of the identified clusters Define customized communications for each cluster The segmentation model in particular should allow you to distinguish customer clusters by virtue of their propensity to purchase the new high end savings product proposed by your firm You will optimize your understanding of your customers 6 1 3 Your Approach For organizational reasons you want to define five groups of customers or clusters and describe the customer profiles for each of these groups To accomplish this project you will use the sample of 50 000 people who responded to your first test during the previous campaign This file corresponds to the sample file CensusOL csv provided with nfinite nsighf and described in the section Introduction to Sample Files page 59 6 1 4Your Business Issue In your marketing database you have A list of 1 000 000 prospects A list of 50 000 prospects people selected during the test phase of your campaign whose responses to the campaign are known This sample thus constitutes a training data set This sample taken from the complete database also exhibits some missing values Your business issue thus consists of Rapidly building a segmentation model using the training data set or sample The clusters obtained will allow you to better understand the p
158. e that describes your data and which serves to explain a target variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 32 4 6 5 2 2 Synonyms Depending upon your profile and your area of expertise you may be more familiar with one of the following terms to refer to explanatory variables Causal variables Independent variables Input variables These terms are synonyms 4 6 5 2 3 Example Your company is marketing two products A and B You have a database which contains references to 1 500 of your customers You know which product A or B each customer has purchased 10 000 prospects You want to know which product each prospect is likely to purchase The variables name age address and socio occupational class are your explanatory variables they allow you to generate a model capable of explaining and predicting the value of the target variable product purchased The following table represents your database Name Age Residence Socio Occupational Category Product Purchased Charles 34 New Orleans Manager Administrator Product A John 37 Washington Manager Administrator Product A Marlene 31 Boston Civil servant Product B Prospect 1 34 Oakland Manager Administrator Prospect 2 24 Washington Civil servant Prospect n 35 Sacramento Skilled tradesman SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013
159. eate a Data Manipulation Create a Clustering Model Load an Existing Data Manipulation Create a Time Series Analysis Perform an Event Log Aggregation Create Association Rules Perform a Sequence Analysis Load a Model Perform a Text Analysis Wa Social we Toolkit ag N Create a Social Network Analysis Open the Data Viewer Load a Social Network Analysis Model Perform a Data Transfer List Distinct Values in a Data Set Get Descriptive Statistics for a Data Set 2 Click the feature you want to use SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 60 5 1 8 1 Editing the Options MI To Edit the Options of InfiniteInsight 1 Click the button Help in nfinite nsight 2 Click the button in the help panel The following options can be modified Category Options General Country Language Message Level Log Maximum Size Message Level for Strange Values Display the Parameter Tree Number of Store in the History Always Exit without Prompt Include Test in Default Cutting Strategy Stores Default Store for Apply in Data Set Default Store for Apply out Data Set Default Store to Save Models Metadata Repository Enable Single Metadata Repository Edit Variable Pool Content Graphic Profit Curve Points Bar Count Displayed No KXEN Look and Feel Displa
160. ecedent compared with the chances of randomly finding the consequent A value greater than 1 indicates that using the antecedent increases your chances to find the consequent 7 1 1 1 62 Lift profit Lift profit allows examination of the difference between a perfect model and a random model and between the model generated by nfinite nsight and a random model It represents the ratio between a model and the random model that is the performance of a model compared to a model that would only allow to select observations at random from your database You can thus visualize how much better your model is compared with the random model SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 270 7 1 1 1 63 lower bound The term lower bound is defined as an element of P which is lesser than or equal to every element of S M 7 1 1 1 64 maximum error LInf maximum absolute difference between predicted and actual values upper bound Chebyshev distance 7 1 1 1 65 mean The arithmetic average value of a collection of numeric data 7 1 1 1 66 mean absolute error L1 mean of the absolute values of the differences between predictions and actual results City block distance or Manhattan distance 7 1 1 1 67 mean absolute percentage error MAPE The MAPE value is the average of the sum of the absolute values of the percentage errors It measures the accuracy of the model s forec
161. ected values For instance in the context of a promotional mailing campaign a person is associated with A benefit for responding to the promotional mailing A cost for not responding to the promotional mailing 4 9 2 Available Profit Types To visualize the profit that may be realized using a model generated by the nfinite nsighf you may use the following profit types Detected profit Lift profit Standardized profit Customized profit 4 9 2 1 Detected Profit Detected profit is the profit type shown as the default It allows examination of the percentage of observations belonging to the target category of the target variable that is the least frequent category as a function of the proportion of observations selected from the entire data set Using this profit The value O is assigned to observations that do not belong to the target category of the target variable The value 1 frequency of the target category of the target variable in the data set is assigned to observations that do belong to the target 4 9 2 2 Lift Profit Lift profit allows examination of the difference between a perfect model and a random model and between the model generated by nfinite nsight and a random model It represents the ratio between a model and the random model that is the performance of a model compared to a model that would only allow to select observations at random from your database You can thus visual
162. ed automatically using the band count parameter The number of categories displayed corresponds to the value of the band count parameter For more information about configuring this parameter please refer to the section Band Count for Continuous Variables 6 2 3 4 3 Plot Options VI To Switch Between Validation Data Set and All Data Sets Plots 1 Click the 28 button to display all data sets The plot displaying all data sets will appear ally Category Significance ABO Bw AaSHAS Variables fage Variable age TF o mp Influence on Target i es oy aes fp mo i ee g eo w e e Categories m Estimation Validation SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 232 2 Click the uid button to go back to the Validation Data Set plot Vl To Switch between Curve and Bar Charts 3 Click the button to display the curve chart The curve plot will appear aly Category Significance ABO AmB ASUAS varebles faghS Variable age da E 2 o 3 2 D 2 D a VD oD BV BV BV BV BV gd oD of DV BV BV a Dl IV oA gi gi 4o Ag BP OP at Qt al at Ge Ge GAB Or Gh ge ge Pt gt yt Values E Random W Wizard Estimation Validation Tina 4 Click the aly button to go back to the bar chart Note You can combine the different types of plot For example you can display AIl Dat
163. editor M To Print the Model Overview gt 1 Click the 3 Print button situated under the title A dialog box will appear allowing you to select the printer to use 2 Select the printer to use and set other print properties if need be 3 Click OK The report will be printed VI To Save the Model Overview Click the L Save button situated under the title The file is saved in HTML format 6 2 3 3 Model Graphs 6 2 3 3 1 Definition The model graphs allow you to View the realizable profit that pertains to your business issue using the model generated Compare the performance of the model generated with that of a random type model and that of a hypothetical perfect model On the plot for each type of model the curves represent the realizable profit Y axis or ordinate as a function of the ratio of the observations correctly selected as targets relative to the entire initial data set X axis or abscissa SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 226 6 2 3 3 2 Displaying the Model Graphs I To Display the Model Graphs 1 On the screen Using the Model click the Model Graphs option The model graph will appear Performance 1 00 0 95 0 90 0 85 0 80 0 75 0 70 0 65 0 60 0 55 0 50 0 46 Detected Profit 0 40 0 35 0 30 0 25 0 20 0 15 0 10 0 05 0 00 Fgh ooh et at
164. eighted Population that is the number of records when using a weight variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 146 as 5 2 3 10 2 3 Profit Curve The profit curve for the current decision tree is displayed in the tab Profit Curve located in the lower part of the panel This profit curve changes with every modifications made on the decision tree Node Details Profit Curve ota hohe ho he The profit curve corresponding to the node containing the whole population is equal to the random curve KI 0 57 E Random W Wizard m Decision Tree m K2R Model ALVEEPPEEDEEELDBDE SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 147 When you expand the node with the highest percentage of positive target the profit curve will improve over the first percentiles which means that the model will detect the population with the highest scores On the contrary if you expand the node with the lowest percentage of positive target the profit curve will improve over the last percentiles However if the node you expand contains a very small population the profit curve will not be impacted So you need to find the best compromise between the size of the populati
165. eling parameters Select a data source to be used as a training data set Describe the selected data set on page 202 Select the variables Select the explanatory variables Checking the Modeling Parameters Define the number of clusters This step is optional oar WN SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 199 6 2 1 1 Selecting a Data Source B For this Tutorial Use the file CensusOl csv as a training data set This file represents the sample that you had extracted from your database and used for the test phase of your direct marketing campaign As specified in your test plan this file contains 50 000 prospects of which you know the behavior with respect to the new financial product 25 of the prospects showed themselves to be clearly interested They chose to accept a meeting with one of your sales agents 75 of the prospects declined your invitation In this file you created a new variable Class which corresponds to the reaction of prospects contacted during the test You assigned The value 1 for those prospects who responded positively to your invitation The value O for those prospects who responded negatively to your invitation I To Select a Data Source 1 Onthe screen Data to be Modeled select the data source format to be used Text files ODBC KXEN InfiniteInsight
166. eme ou fh W ND SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 78 SSS T Click the Yes button to validate your interval To Split an Interval Select the interval to split Click the Split button The selected interval is automatically split into two equal intervals gg i To Merge Two Intervals Select the intervals to merge You can only select adjoining intervals Click the Merge button X ae 6 6 6 hl JT Indude Smaller Data Indude Higher Data ee fa _ mh 62 0 90 0 R Split Add Remove Enable the target based optimal grouping performed by K2C VYAAAOEBPA AAA M To Delete an Interval 12 Select one or more intervals You can only select adjoining intervals 13 Click the Remove button The previous and next intervals are extended to include the values previously contained in the deleted intervals so that no gap is left between intervals 5 2 1 2 6 2 Structure for an Ordinal Variable The structure for an ordinal variable is similar to that of a continuous variable with the exception of the bounds which are always closed and cannot be modified Warning the structure for an ordinal string variable cannot be edited 5 2 1 2 6 3 Structure for a Textual Variable The structure for a textual var
167. ent in each cluster The figure below presents the Target Means plot obtained in during this scenario The bars have been sorted in descending order lt KXEN InfiniteInsight class_Census01 ally Clusters Summary ASaAaASRE Plot Target Means x IV Descending Sort Target Means Data Set Estimation 7 0 887506008 143 1934 a Ka k Clusters m Target Means a Among the five clusters Cluster 7 is the one that has the greatest proportion of observations belonging to the target category of the target variable In fact 88 7 of the customers contained in cluster 7 belong to target category 1 target variable Class In other words 88 7 of the customers contained in cluster 7 responded ina positive manner to the test phase of your marketing campaign Cluster 4 is the cluster with the lowest density of observations belonging to the target category Less than 1 of the customers contained in this cluster responded positively to the test phase of your marketing campaign SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 239 6 2 3 5 2 2 The Plot Frequencies The Frequencies plot presents the number of observations contained in each cluster relative to the total number of observations contained in the data set The figure below presents the Frequencies plot obtained during this scenario The bars have
168. entation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 214 Note The current policy is to use LInf either in unsupervised mode or when the clusters SQL expressions have been asked for and L2 otherwise The Encoding Strategy option refers to the kind of encoding the segmentation engine is expecting from the data encoder of Infinitelnsight I To Choose an Encoding Strategy Choose among the following options from the drop down list Option System Determined Target Mean Uniform Unsupervised Description Lets the system select the best encoding according to the model parameters The Target Mean encoding is used for supervised models Otherwise variables are encoded using the Unsupervised scheme Default value for supervised clustering Each value of a continuous input variable is replaced by the mean of the target for the segment the value belongs to Each category of a nominal input variable is replaced by the mean of the target for this category In case of a nominal target variable the mean of the target corresponds to the percentage of positive cases of the target variable for the input variable category Each variable segment is encoded in the range 1 1 so that the distribution of the variables is uniform Default value for unsupervised clustering A target free strategy Only segment frequency is used to encode variables The following options will only be displayed when all va
169. entire data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 113 When the target is continuous the following curve is displayed aly Model Graphs BAB 25uas Debriefing Type Predicted vs Actual E Performance 225 25 0 275 300 325 350 375 400 425 450 Predicted Wizard Validation aa The default parameters will display the curves corresponding to the Validation sub set blue line and the hypothetical perfect model Wizard green line The blue area represents the standard deviation of the current model For more information on the meaning of model curves see Understanding Model Graphs on page 118 2 When there is more than one target select the target for which you want to see the curves in the Models list Note To each variable corresponds a model The name of each model is built from the rr_ Robust Regression prefix and the model target name 3 Select the viewing options that interest you For more information about viewing options 5 2 3 3 2 Definition Depending on the type of the target the model graph plot allows you to View the realizable profit that pertains to your business issue using the model generated when the target is nominal Compare the performance of the model generated with that of a random type model and that of a hypothetical perfect model when the targe
170. ers contained in the data set or 92 do not realize any annual capital gains while none of the customers contained in cluster 7 fail to realize some annual capital gain Checking the Fix Variable box would allow you to compare the profiles of the variable capital gain for all the segments SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 246 6 2 3 6 4 SQL Expressions The Cross Statistics screen also allows you to visualize the SQL Expression used to define each cluster Note SQL Expressions are only available if you have selected the Calculate SQL Expressions option in the Modeling Parameters Advanced Screen before generating your model I To Display SQL Expression for a Cluster 1 Select the cluster in the summary table The plot for the selected cluster is displayed 2 Click the SQL button The SQL expression replaces the cross statistics plot in the lower part of the screen KXEN InfiniteInsight class_Census01 ally Cluster Profiles Aw ASAs O Cluster Index Frequencies 5 9 1 2 6 5 3 4 Variable ranges for Cluster 7 AND capital gain in 44650 99999 Gr umn 3 Click the small magnifier icon to explore the SQL expression structure 4 Click the 24 Cross Statistics button to go back to the Cross Statistics plot SAP Infinite
171. es as well as the probability that this observation belongs to the target category of the target variable In our example only one variable has been defined The probability that this observation belongs to the target category of the target variable is 0 1120 Note that certain variables of the table of Explanatory variables were automatically completed upon execution of the simulation In fact the model automatically completed certain missing values that were essential to the simulation These values are listing in the following table Type of variable Default value continuous variable the mean value nominal variable the most frequent category ordinal variable the most frequent category These changes are reflected in the left part of the screen after clicking the Run button 6 You can modify the value of an explanatory variable and run the simulation again to measure the effect of that change with respect to the target variable For instance 1 Assign the value Widowed to the variable marital status in place of the value Married civ spouse 2 Runthe simulation The probability now obtained is 0 0040 7 Click the Reset button to run the simulation again SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 174 5 2 4 4 Refining a Model Infinite nsight allows you to refine a currently open model For instance you can
172. es Contributions Check the All option MV To Add Specific Variable Contributions 1 Check the Individual option 2 Click the gt gt button to display the variable selection table 3 Inthe Available list select the variables you want to add use the Ctrl key to select more than one variable 4 Click the gt button to add the selected variables to the Selected list SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 167 5 2 4 2 6 Types of Results Available The application of a model to a data set allows you to obtain four types of results which are described in the following table Type of Results score or predicted value probability prediction range or maximum error individual contributions decision Description For a continuous variable the predicted value corresponds to the value predicted by the model for the target variable of each observation The predicted values correspond to the values read off the X axis of the profit curve plot The predicted value of each observation is calculated by replacing the parameters of the polynomial representing the model by the values of each of the variables of that observation For a binary variable the model outputs a score Corresponds to the probability of each observation belonging or not to the target category of the target variable The prediction range
173. et T Fix Vari Data Set Estimation Variables capital gain 7 j Cluster 7 vs Whole Population for Variable capital gain Fraction 0 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 10 0 4650 14650 99999 ategories m H 199999 W All Population W Cluster 7 KL 2 937 KL Significance 1 Gra umn Gm SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 245 In the figure above the table allows you to identify cluster 7 as the cluster containing the highest density of observations belonging to the target category of the target variable 88 8 of customers contained in this cluster belong to target category 1 of the target variable Class The cross statistics plot allows you to view and compare the profiles of the variable capital gain over the entire data set and over cluster 7 These profiles are repeated in the table below Categories Profile over the Profile over of the variable capital gain data set cluster 7 0 92 0 JO 4650 3 0 4650 99999 5 90 99999 1 10 The data distribution of the category J4650 99999 makes it clear that the majority of customers contained in cluster 7 realize significant annual capital gains relative to the entire set of customers contained in the data set In addition the data distribution over the category 0 indicates that the majority of the custom
174. etection threshold is varied Performance 1 0 o9 os 07 oe o05 D o g4 03 0 2 0 1 o0 OL Or gA o 0E oega o oE ola o 0 9 S 1 Specificity Random Wizard Validation SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 46 Sensitivity which appears on the Y axis is the proportion of CORRECTLY identified signals true positives found out of all true positives on the validation data set 1 Specificity which appears on the X axis is the proportion of INCORRECT assignments to the signal class false positives incurred out of all false positives on the validation data set Specificity as opposed to 1 specificity is the proportion of CORRECT assignments to the class of NON SIGNALS true negatives 4 10 2 Lorenz Curves 4 10 2 1 Lorenz Good Lorenz Good displays the cumulative proportion of missed signals false negatives accounted for by the records corresponding to the bottom x of model scores Performance 1 Sensitivity o o o o o o o o o a 3b B DH wD oD a o 0 08 gh a2 92 0 gh nd 92 6 obne gt 19 98 e o2 02 A oS Og d 0742 O73 OMA OPH oe OT g1 OF OPQ percentage E Random W Wizard Validation The Y axis measures 1 sensitivity that is 1 the proportion of true positives which is equivalent to the proportion of missed signals or lost opportunity Because the data are
175. etween predicted and actual values upper bound Chebyshev distance Formula Lo maxu i 4 8 3 4 Error Mean Definition mean of the difference between predictions and actual values Formula Mean Percent Error MPE 1 N y MPE 5 w W i l Vi SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 43 Mean Absolute Percent Error MAPE MAPE 5 yp eal j l J 4 8 3 5 Error Standard Deviation Definition dispersion of errors around the actual result Formula C where 1 N u 9u Nia 4 8 3 6 Classification Rate Definition ratio between the number of correctly classified records and the total number of records Formula Ga tH BB rt CR t G B p l l pa BM 4 8 3 7 Determination Coefficient R2 Definition ratio between the variability sum of squares of the prediction and the variability sum of squares of the data Formula N SSR w y i l 4 2 SST X w 0 F i l SSR SST SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 44 4 9 Profit Type 4 9 1Definition A profit type allows calculation of the profit that may be realized using the model In general a benefit is associated with the positive or expected values of the target variable and a cost is associated with the negative or unexp
176. eyayso Add Filter in Data Set py Analyze LD open Description ed seve Description un 6 Click the Next button 5 2 1 2 1 Why Describe the Data Selected In order for the nfinite nsight features to interpret and analyze your data the data must be described To put it another way the description file must specify the nature of each variable determining their Storage format number number character string string date and time datetime or date date Notes When a variable is declared as date or datetime the KXEN Date Coder feature KDC automatically extracts date information from this variable such as the day of the month the year the quarter and so on Additionnal variables containing this information are created during the model generation and are used as input variables for the model KDC is disabled for Time Series Type continuous nominal ordinal or textual For more information about data description see Types of Variables on page 26 and Storage Formats on page 29 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 69 5 2 1 2 2 How to Describe Selected Variables To describe your data you can Either use an existing description file that is taken from your information system or saved froma previous use of nfinite nsight features Or create a description file usin
177. fic transition from one page to another Similar to nfinite nsight Explorer Event Logging these new columns of data can be added to existing customer data and are made available to other nfinite nsight features for further processing Intinitelnsight Modeler Data Encoding formerly known as K2C automatically prepares and transforms data into a format suitable for use in the nfinite nsight InfiniteInsight Modeler Data Encodingtranslates nominal and ordinal variables automatically fills in missing values and detects out of range data In addition this feature contributes significantly to the robustness of the models generated by the nfinite nsight engine by providing a robust data encoding 3 2 2 3 Phase 3 Data Modeling Thanks to the statistical techniques and information technologies upon which the nfinite nsight Modeler Regression Classitication Intinitelnsight Modeler Segmentation Clustering and Infinite nsight Modeler Time Series features were built these features require only an extremely short modeling time to generate relevant and robust analytical models of your data Intinitelnsight Modeler Regression Classification formerly known as K2R generates explanatory and predictive models The models generated by Classification Regression explain and predict a phenomenon or business question by a function of the analyzed data set the explanatory variables The models generated are calculated using a regression and classi
178. fication algorithm This polynomial regression is a proprietary algorithm developed by KXEN using Vapnik s SRM Structural Risk Minimization principle to calculate the parameters Infinitelnsight Modeler Segmentation Clustering formerly known as K2S generates descriptive models which means a function to regroup cases in a data set into a number of clusters with similar behavior toward a business question Infinitelnsight Modeler Time Series formerly known as KTS lets you build predictive models from data representing time series Thanks to nfinite nsight Modeler Time Series models you can Identify and understand the phenomenon represented by your time series Forecast the evolution of time series in the short and medium term that is predict their future values SAP Infinitelnsight 6 5 SP4 CUSTOMER SAP Infinitelnsight 2013 SAP AG or an SAP affiliate company All rights reserved 13 3 2 2 4 Phase 4 Model Presentation and Deployment Once the models have been generated model performance indicators plots and modeling reports in HTML format facilitate viewing and interpretation of the data modeling results Once the models have been validated you can apply them to One or more specific observations taken from your database Simulation mode Anew complete data set or application data set Batch mode To facilitate deployment and integration of the models the code corresponding to each model can also be
179. filiate company All rights reserved 156 2 Inthe section Application Data Set select the format of the data source in the list Data Type 3 Click the Browse button to select e In the Folder field the folder which contains your data set In the Data field the name of the file corresponding to your data set 4 Inthe section Results generated by the model select the file format for the output file in the list Data Type 5 You may also opt to select Keep only outliers If you select this option only the outlier observations will be presented in the results file obtained after applying a model 6 Click the Apply button The screen Applying the Model will appear Once application of the model has been completed the results files of the application is automatically saved in the location that you had defined from the screen Applying the Model KXEN InfiniteInsight class_Census01 as Applying the Model Beginning of applying model Please wait a Stop Current Task SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 157 5 2 4 2 2 Classification Decision The screen Classification Decision allows you to select how many observations you want the model to detect after application on the new data set I To apply a Classification Decision 1 Onthe screen Applying the Model follow all the steps of the procedure To Apply a
180. finiteInsight class_Census01 FFAAAIAAAAAAA eeeeeeeeeeeaht Number of Selected Variables 2 Gi 6 Click the Next button A message This will reset the current Do you really want to do this will appear This will reset the current model Do you really want to do this SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 177 7 Click Yes to move to the screen Selecting Variables KXEN InfiniteInsight class_Census01 a Selecting Variables Explanatory Variables Selected Target Variables gt dss lt Weight Variable Exduded Variables Number of Variables 2 A T Alphabetic Sort Gr 8 Resume the model configuration from the step Selecting Variables SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 178 nn On eee 5 2 4 5 Generating the Source Code of a Model The feature used to generate the source code of a model is nfinite nsight Scorer For more information on the Intinitelnsight Scorerfeature read the Integration Guide for KMX Generated Codes Warning A specific license is needed to use this feature The code file generated by nfinite nsighf will contain all information necessary to the model such
181. for All Variables 1 Right click the row corresponding to the variable to be edited 2 Select Define Structure lt KXEN InfiniteInsight New Regression Classification Model jeducation jeducation num marital status occupation relationship ace S wl ov s w rol sex H n capital gain loss 12 13h T Add Filter in Data Set pp Analyze T open Description 3 Select Enable nfinite nsight Modeler Data Encoding Optimal Grouping for All Variables in such a way that the option is checked SAP Infinitelnsight 6 5 SP4 OO Oooo o o aoo ofa Extract Categories from Statistics Extract Structure from Model Extract Structure from Model for All Variables Extract Structure from Variable Build New Structure Remove Structure Set Band Count for Continuous Variables CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 85 5 2 1 3 Filtering the Data Set In order to accelerate the learn process and to optimize the resulting model you can apply a filter to your data set mg For this scenario Do not use the filtering option To Filter a a Set 1 Check the option Add a Filter in Data Set 2 Click Next To Add a Condition 3 Click the button Add Condition The window Define a Condition opens x age bd
182. format of the translation in the list Data Type Use the Browse button located on the right of the Folder field to select the folder or the database in which the description is stored Use the Browse button located on the right of the field Table or File to select the file or the table containing the description Click the OK button H Click the button Update to refresh the display of the categories If the list of columns is not named correctly use the Advanced Settings E see next paragraph to set a header line and update again Map the language names with those from the loaded translation by clicking the categories and choosing the corresponding language in the contextual menu Click the OK button To Set a Header Line Click the tab Header Line Check the option Force Header Line In the field Line enter the number of the line you want to use as header line Click OK 6 2 1 5 Selecting Variables Once the training data set and its description have been entered you must select different variables one or more targets variables The nfinitelnsight Modeler Segmentation Clustering feature is capable of segmenting a data set independently that is it does not require that a target variable be selected However even though this is not required we strongly recommend selecting a target variable For the process of segmenting a data set gains maximum meaning only when it is accomplished in relation to a domain
183. g the Analyze option available to you in nfinite nsight In this case it is important that you validate the description file obtained You can save this file for later re use If you name the description file KxDoc_ lt SourceFileName gt it will be automatically loaded when clicking the Analyze button Important The description file obtained using the Analyze option results from the analysis of the first 100 lines of the initial data file In order to avoid all bias we encourage you to mix up your data set before performing this analysis Each variable is described by the fields detailed in the following table The Field Gives information on Name the variable name which cannot be modified Storage the type of values stored in this variable Number the variable contains only computable numbers be careful a telephone number or an account number should not be considered numbers String the variable contains character strings Datetime the variable contains date and time stamps Date the variable contains dates Value the value type of the variable Continuous a numeric variable from which mean variance etc can be computed Nominal categorical variable which is the only possible value for a string Ordinal discrete numeric variable where the relative order is important Textual textual variable containing phrases sentences or complete texts Warning When creating a text coding model if there is not atleast
184. get variable in a given context They may also be used as weight variables For more information about the role of each nfinite nsight features see Operations page 12 You can then generate models on page 34 capable of either explaining and predicting a phenomenon or describing a data set in both cases as a function of the previously defined target variable This phase is called the training phase Once the models have been generated you can view and interpret their relevance and robustness using Performance indicators on page 38 the quality indicator KI and robustness indicator KR A variety of plots including the profit curve plot 4 2 Data Sources Supported In the standard version nfinite nsighf supports the following data sources Flat files text files in which the data are separated by a delimiter such as commas in csv Comma Separated Value format file For instance the sample file CensusOl csv used for the nfinite nsight Modeler Regression Classification and Infinitelnsight Modeler Segmentation Clustering application scenarios is a csv file _ ODBC compatible data sources if your license allows it you can also use SAS files An API also allows you to interface nfinite nsight with any other application SPSS Microsoft Excel for example and gain access to any other data source You must develop a specific dll script for each new source Note For information about data formatting and
185. greater than or equal to every element of S V 7 1 1 1 125variable A variable corresponds to an attribute which describes the observations stored in your database In KXEN components a variable is defined by Type Storage format Role 7 1 1 1 126variable pool The variable pool is a repository where the user stores the description of the frequently used variables It is located in the connector store needs to be associated before does nothing if not associated This call stores all the variable information usually edited by the user standard description storage type value etc mapping information and structure The information stored in the pool is retrieved on the next guessed description The user can also choose to save only the description of a specific variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 281 7 1 1 1 127 variable type There are four types of variables continuous variables ordinal variables nominal variables textual variables W 7 1 1 1 128weight variable A weight variable allows one to assign a relative weight to each of the observations it describes and actively orient the training process To declare a variable a weight variable results in creating a number of copies of each of the data set observations proportional to the value they possess for that variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary
186. h ath ath ath gh eth goth ah goth th oth 6 goth goth th gh ot percentage E Random W Wizard W Validation The default parameters will display the profit curves corresponding to the Validation sub set blue line the hypothetical perfect model Wizard green line and a random model Random red line The default setting for the type of profit parameter is Detected profit and the values of the abscissa are provided in the form of a percentage of the entire data set 2 When there is more than one target select the target for which you want to see the curves in the Models list Note To each variable corresponds a model The name of each model is built from the KC___ prefix and the model target name 3 Select the viewing options that interest you For more information about viewing options Viewing Options 6 2 3 3 3 Plot Options MI To Display the Graphs for the Estimation Validation and Test Sub sets Two buttons situated above the plot on the screen Model Curves allow you to switch between the graph for the Validation sub set Ay the graphs for all the sub sets as VI To Copy the Model Graph Click the BA Copy button The application copies the parameters of the plot You can paste it into a spreadsheet program such as Excel and use it to generate a graph SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved
187. h test at the end cutting strategy distributes 4 5 of the initial data set in a random manner in the two sub sets estimation and validation 3 5 being distributed in the estimation data sub set and 1 5 in the validation data sub set The final 1 5 of the initial data set is sent directly into the test sub set This is a useful strategy in cases where Your database corresponds to a well defined evolution because of the way it was built which may mean for example that the data is in chronological order You may wish to take this order into account when generating your model For example imagine that New customers are added every month to your database You know that the data sets to which you apply the model will once generated have a better chance of resembling the most recent section of your database that is the section that contains the most recent customers entered Using the Random with test at the end cutting strategy you decide to test the model generated on that section of your database that is most likely to resemble the state of your future application data sets 4 4 3 2 4 Random Without Test Default Strategy The Random without test strategy is the cutting strategy suggested as the default setting It distributes the whole initial data set in a random manner to the two sub sets of estimation and validation 3 4 of the initial data set are distributed to the estimation sub set 1 4 to the initial data
188. hat if you want that 80 of the people who will respond positively to your campaign receive your mailing you will have to send it to 32 of the entire population On the other hand if you select the option of Population and set the cursor to 20 on the scale the value of the field of Detected Target will be 60 4 which means that if your budget only allows you to send your mailing to 20 of the entire population you will touch 60 of the population who will respond positively For more details on how to use the Confusion Matrix see section Analyzing and Understanding the Model gt Confusion Matrix on page 140 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 159 5 2 4 2 4 Using the Option Direct Apply in the Database This optimized scoring mode can be used if all the following conditions are met the apply in data set table view select statement data manipulation and the results data set are tables coming from the same database the model has been computed while at least one physical key variable was defined in nfinite nsight there is a valid nfinite nsight Scorer license for the database pno error has occurred the in database apply mode is not deactivated granted access to read and write create table M To Use the In database Apply Mode Check the option Use the Direct Apply in the Database
189. he current observation The closest cluster is the one the observation belongs to its name is displayed in the column kc_name_ lt Target variable gt The next closest cluster is displayed in the column kc_name_ lt Target Variable gt _2 and so on until the furthest cluster You can choose to add all the clusters or only the closest M To Add All the Clusters Check the All option I To Add Only the Closest Clusters 1 Check the Top option 2 Inthe text field enter the number of clusters you want to add for example the two three or four closest Note The name of a cluster is by default its number you can modify this in the column User Name of the panel Clusters Profiles accessible through the main menu 6 2 4 1 3 4 3 Top Ranking Distances This option allows you to add to the output file the distances of each observation from the clusters centroids The distance from the closest centroid is displayed int the column kc_best_dist_ lt TargetVariable gt the distance from the second closest centroid is displayed in the column kc_best_dist_ lt TargetVariable gt _2 and so on until the furthest centroid You can add the distances from all centroids or only the shortest VI To Add All the Distances Check the All option MVI To Add Only the Shortest Distances 1 Check the Top option 2 Inthe text field enter the number of distances you want to add for example the two three or four shortest Note When the SQL mode is activated the
190. he current view of the displayed report The data can then be pasted in a text editor a spreadsheet a word processing software If the current report contains more than one view for various variables data sets and so on this option allows you to copy all the views of this report If the current report is displayed as a graph this option allows you to copy it as an image and paste it in a word processing software or a graphic application This option allows you to print the current view of the selected report depending on the chosen display mode HTML table graph This option allows you to save under different formats text html pdf rtf the data from the current view of the selected report This option allows you to save under different formats text html pdf rtf the data from all the views of the selected report This option is available for all display modes and allows exporting the current view into Excel compatible with Excel 2002 2003 XP and 2007 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 137 5 2 3 8 Scorecard This screen provides you with the coefficients associated to each category for all variables in the model only in case of a regression model nfinite nsight Modeler Regression Classification To obtain a score add all the coefficients corresponding to the selected value of each va
191. he parameter you want to set The following table sums up the available parameters Parameter Values Description Number of Reason Integer Number of reason codes you want to generate Codes Default 3 Threshold Mean default Threshold used for computing the most important reason codes For each variable Maximum the contribution corresponding to the customer score is compared to its Mini contribution for the whole population The variables for which the contribution is IUN the most differential are selected as the most important reason codes For example if you select Mean the customer variable contribution will be compared to the mean of the whole population contribution to determine which variables are the most differential Criterion Below default Indicates whether you want to generate the reason codes when the customer Above variable contribution is above or below the threshold Warning Using Below with the Minimum threshold or Above with the Maximum threshold will generate an error 1 Ifyou want to generate several types of reason codes repeat steps 3 and 4 for each type 5 2 4 2 5 3 1 1 Output The output table contains two columns for each reason code requested REASON NAME lt CRITERION gt _ lt THRESHOLD gt _ lt RANK gt _RR_ lt TARGET NAME gt contains the name of the variable selected as a reason code For example the output column named REASON_NAME_BELOW_MEAN_1_RR_CLASS contains the name of the va
192. hile they call on complex innovative statistical techniques they are still straightforward and quick to use they put powerful Data Mining techniques within the reach of non expert users For more technical details regarding nfinite nsight please contact us on page 9 We will be happy to provide you with more technical information and documentation SAP Infinitelnsight 6 5 SP4 CUSTOMER Welcome to this Guide 2013 SAP AG or an SAP affiliate company All rights reserved 7 2 1 3 What this Document covers This document introduces you to the basic concepts underpinning nfinite nsight and the main functionalities of the nfinite nsight Modeler Regression Classitication and Infinite nsight Modeler Segmentation Clustering features Using two application scenarios you can create your first models with confidence This document is the primary guide to the two nfinite nsight features described in the following table The feature Allows you to Example Infinitelnsight Understand and predict a You work for an automobile manufacturer and wish to send a promotional Modeler phenomenon mailing to your prospects nfinite nsight Modeler Regression Classificat Regression Classification allows you to fon Understand why previous prospects responded to such a mailing Predict the response rate to such a mailing sent to new prospects Infinitel nsight Describe a data set by Your firm is in the process of bringing product
193. ht 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 64 5 2 1 1 Selecting a Data Source B For this Scenario Use the file CensusOl csv as a training data set This file represents the sample that you had extracted from your database and used for the test phase of your direct marketing campaign As specified in your test plan this file contains data concerning 50 000 prospects for whom you now know the behavior with respect to the new financial product 25 of the prospects showed themselves to be clearly interested They chose to accept a an invitation for a meeting with one of your sales channel agents 75 of the prospects declined your invitation In this file you created a new variable Class which corresponds to the reaction of prospects contacted during the test You assigned The value 1 to those prospects who responded positively to your invitation The value O to those prospects who responded negatively to your invitation I To Select a Data Source 1 Onthe screen Data to be Modeled select the data source format to be used Text files ODBC KXEN InfiniteInsight Reference Data Set Use a File or a Database Table Use Explorer Data Type Text Files Folder Samples J Browse Data Set _ amp E Browse of Cutting Strategy v sofea Samples KTC Text Files T GEER
194. ht Variable Also select a variable in the screen section Weight Variable and click the button lt to move the variables back to the screen section Explanatory variables selected SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 211 6 2 1 5 3 Explanatory Variables By default and with the exception of key variables all variables contained in your data set are taken into consideration for generation of the model You may exclude some of these variables The decision whether to include or exclude a variable for generation of your segmentation model depends upon domain specific considerations Your domain specific knowledge allows you to determine which variables are the most useful for description of the clusters or homogeneous groups A regression model generated using nfinite nsight Modeler Regression Classification formerly known as K2R would also be used as a tool to determine the variables with the greatest explanatory power for a given phenomenon E For this Scenario Exclude the variable Kx ndex as this is a key variable Since the initial data set does not contain a key variable nfinite nsight feature generated KxIndex automatically Retain all the other variables Mi To Exclude some Variables from Data Analysis 1 Onthe screen Selecting Variables in the section Explanatory Variables Selected left hand side
195. iable Structure 1 Select the variables for which you want to extract the structure lt KXEN InfiniteInsight New Regression Classification Model amp Edited Description jeducation num imarital status occupation relationship race sex capital gain capitaltoss 13 hours per week 14 native country 15 dass 16 KxIndex BIEL S on en BM ro 6 5 5 5 5 5 5 5 5 55 fg foo ololololo So o c oo T Add Filter in Data Set pp Analyze LD open Description bed seve Description l Q view Data Tey SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 73 2 Right click the table a contextual menu is displayed KXEN InfiniteInsi New Regression Classification Model fe Description Desc_Census01 csv Extract Categories from Statistics Extract Structure from Model Extract Structure from Model for All Variables hours per week jinteger continuou Extract Structure from Variable native country string nominal Build New Structure dass linteger nominal 16 KxIndex linteger continuou daissi v Enable K2C Optimal Grouping for All Variables Automatically 7 Add Filter in Data Set py Analyze DW open Description bed sve Description l 3 Select
196. iable cannot be edited SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 79 5 2 1 2 6 4 Structure for a Nominal Variable The structure for a nominal variable is made of groups containing the variable categories X relationship i x Group Structure Category Edition Husband Wife B Not in family Not in family B Other relative Own child Other relative Own child Add New Group Add Category B Unmarried Unmarried New Category l Remove Group Add Missing T Alphabetic Sort JV Enable the target based optimal grouping performed by K2C OK Cancel Mi To Create a New Category Group 1 Inthe list Category Edition select the categories you want to add in a new group Use the Ctrl key to select several categories Group Structure Category Edition p Assoc acdm Assoc voc Assoc acdm Assoc voc B Bachelors Bachelors p Doctorate Prof school Doctorate Prof school 07 Masters Masters o ony Some college fea Ren toeoey Remove Group Remove Category Merge Add Missing T Alphabetic Sort JV Enable the target based optimal grouping performed by K2C SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 80 2 Click the button Add New Grou
197. iable categories 7 ClickOK Note You can edit a condition by double clicking it VI To Add a Logical Conjunction Click the button Add Logic And or the button Add Logic Or Note You can change a conjunction by double clicking it Vi To Change the Order You can change the order of the nodes to accelerate the filtering process by setting the conditions with the highest probability to be false at the top of the list 1 Select the node you want to move up or down Use the buttons A and b to change its position in the filter To Delete a Node Select the node you want to delete Click the button Remove Selected Node N Ferg YX To Display the Filtered Data Set You can visualize the data set that you will obtain after the application of the filter Click the button View Data A pop up window opens PRKXEN Sample Dataview TST Data Set Census01 csv Tata statistics BZ craph 38 Private 53Private Private Self emp no Private Private Private Self emp no Private 193524 Private 302146 rivate 117037 rivate 1090 wats Local gov 216851 180211 Some college Private 367260 HS grad Private 193366 HS grad Private 386940 Bachelors Private 242406 11th Self emp no 265477 Assoc acdm Self emp no 88506 Bachelors Private 94638 HS grad 57 Federal gov 337895 Bachelors 53 Private 14436 1 HS grad 44 Private 128354 Masters
198. iables Explanatory variables Weight variables SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 30 4 6 5 1 Target Variable 4 6 5 1 1 Definition A target variable is the variable that you seek to explain or for which you want to predict the values in an application data set It corresponds to your domain specific business issue When the target variable is a binary variable the nfinite nsighf considers that the target value or target category of this variable that is the value that is the object of the analysis to be the least frequently occurring value in the training data set Imagine that a training data set containing the customer information of a company contains the target variable responded to my mailing This target variable may take the values Yes or No If the value Yes is the least frequent value for instance if 40 of referenced customers responded to the mailing the nfinite nsighf considers that value to be the target category of the target variable 4 6 5 1 2 Synonyms Depending upon your profile and your area of expertise you may be more familiar with one of the following terms to refer to target variables Variables to be explained Dependent variables Output variables These terms are synonyms SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reser
199. ication formerly known as K2R feature you can rapidly develop an explanatory and predictive model at the least possible cost This model allows you to respond to your business issue and accomplish your objectives 5 1 2 Your Objective Imagine the following case You are the Marketing Director of a large retail bank This bank has decided to offer its customers a new high end savings product It prepares to launch an extensive direct marketing campaign to promote this new product to its prospects and customers The bank is experiencing heavy competition and senior management sensitive to the stakes involved in launching this new financial product wants the marketing campaign completed as soon as possible SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 53 5 1 3 Your Means 5 1 3 1 A Limited and Closely Monitored Budget The enterprise controls of the bank are rigorous and the budget that has been allocated to you for this marketing campaign Does not allow you to contact all of the bank s prospects and customers May not be exceeded 5 1 3 2 The Information at your Disposal The Marketing Department has a database for this campaign which contains the records of 1 000 000 prospects identified by their principal characteristics Age Occupation Sex Education Employer Number of hours worked per week
200. ick the button Edit Variable Pool Content to edit the parameters of the variables stored in the variable pool 5 Click OK to validate 5 2 Creating a Classification Model Using Infinitelnsight Modeler Data modeling with nfinite nsight Modeler Regression Classitication is subdivided into four broadly defined stages Defining the Modeling Parameters Generation and Validation of the Model Analysis and Understanding of the Analytical Results Using a Generated Model A ON 5 2 1 Step 1 Defining the Modeling Parameters To respond to your business issue you want to dentify and understand the factors that determine whether a prospect reacts positively or negatively to your marketing campaign Thereby be able to predict the behavior of new prospects with respect to your campaign The nfinitelnsight Modeler Regression Classification feature formerly known as K2R allows you to create explanatory and predictive models The first step in the modeling process consists of defining the modeling parameters Select a data source on page 65 to be used as training data set Describe the data set on page 68 selected Select the variables the target variables the explanatory variables and possibly a weight variable Check the modeling parameters Setting the Advanced Parameters on page 95 degree target key rule mode variable auto selection and correlations This step is optional a f WN SAP Infinitelnsig
201. ics n Data we Marketing Warehouse Automation aprimo Rules Engine Batch MrBolone j SAP Infinitelnsight 6 5 SP4 CUSTOMER SAP Infinitelnsight 2013 SAP AG or an SAP affiliate company All rights reserved 10 3 2 1User Interfaces 3 2 11 Three Types of User Interface Three types of interfaces allow you to use the features of nfinite nsight Graphical user interface Command interpreter API Application Programming Interface controls 3 2 1 2 Graphical User Interfaces The graphical user interface is aimed primarily at end users or non expert users It provides access to the Infinitelnsight which allow you to use nfinite nsight features and model your data very easily In addition it provides plotted output to facilitate viewing and interpretation of the results of modeling The graphical user interface provided with nfinite nsight is the nfinite nsight interface developed in Java on the CORBA API Application Programming Interface which operates on any platform Windows UNIX among others This interface is provided as an example In addition with the Application Programming Interface furnished with nfinite nsight you can develop your own graphical interfaces 3 2 1 3 The KxShell Command Interpreter The KxShell command interpreter allows you to use nfinite nsight by typing commands or executing scripts containing several commands The command interpreter is an example of development based on the C AP
202. idation Control for Deviations Small KI On Validation Expert Debriefing E Groups Id E Other Variables Performance Indic E Continuous Encoding E Monotonic Yariables B Variables Exclusion Cause E Overall Exclusions Ena PETAS SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 135 The table below shows the possible variables exclusion causes Overall Exclusions Name Target Specific Exclusions Overall Exclusions Constant Overall Exclusions Small Variance Target Specific Exclusions Fully Compressed Target Specific Exclusions Small KI on Estimation Target Specific Exclusions Small KI on Validation Target Specific Exclusions Large KI Difference Target Specific Exclusions Small KR 5 2 3 7 2 Display Options column displayed Ob 68 BE 6 SAP Infinitelnsight 6 5 SP4 Explanation The variab ordinal var targets e has only one value continuous variables or one category nominal or ables in the data set The variable is discarded with respect to all For continuous variables the variance is small The variable variation is noise The variab The variab e is discarded with respect to all targets e has been fully compressed with respect to the target It will be excluded from the model with respect to this target The variab e has a small KI
203. idence level for the value that has been predicted this is also known as the error bar It is computed with 3 standard deviations on the validation data set and bin per bin The percentage of population corresponding to the 3 standard deviations is of about 99 Calculation formula TargetMean 3 sqrt TargetVariance TargetMean 3 sqrt TargetVariance When sqrt TargetVariance is equal to the Standard Deviation TargetMean Standard Deviation is equal to the Confidence Interval to show in the output file which observations are outliers An observation is considered an outlier if the difference between its predicted value and its real value exceeds the value of the error bar In other words the error bar is a deviation measure of the values around the predicted score Possible values are 1 if the observation is an outlier with respect to the current target else O add the variables contributions for the current variable to the output file You can add the contributions of all variables or select only the contributions of specific variables see procedure below 1 Check the Individual option 2 Click the gt gt button to display the variable selection table 3 Inthe Available list select the variables you want to add use the Ctrl key to select more than one variable 4 Click the gt button to add the selected variables to the Selected list 5 2 4 2 5 3 2 1 Predicted Value This option is checked by
204. ifies the points where behavioral changes occur with respect to the target variable and automatically crops the variable into intervals exhibiting homogeneous behavior with respect to the target For more information please see section Optimal Grouping for all Variables aly Category Significance E70 Bw 2Suas vaitles fain Variable education Influence on Target 0 10 0 05 0 00 0 05 0 10 0 15 ae faa ae Ese Bachelors Assoc acdm Assoc voc Categories Some college diia E 10th 11th 12th 1st 4th 5th 6 m Validation SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 132 When categories do not contain sufficient numbers to provide robust information they are grouped in the KxOther category that is created automatically When a variable is associated with too many missing values the missing values are grouped in the KxMissing category that is also created automatically To understand the value of the categories KxOther and KxMissing consider the following example The database of corporate customers of a business contains the variable web address This variable contains the Web site address of the corporate customers contained in the database Some companies have a Web site others do not In addition each Web site address is unique In this case nfinite nsight automatically transform
205. ight Modeler Segmentation Clustering provides you with a KI and KR It can be used to compare the two segmentations especially because the number of segments is the same If KI does not change significantly then the one with SQL may be preferred because it is easier to understand If there is a fall of KI you may want to stick with the basic segmentation KI may not be the thing you want to optimize for segmentation The target profile of each segment is available in the GUI Out of the four clusters maybe one or two are of real interest In that case you have to focus on these interesting segments and see how they evolve with SQL generation 6 2 3 7 Statistical Reports The Statistical Reports provide you with a set of tables that allows you a more detailed debriefing of your model These reports are grouped in different levels of debriefing the Descriptive Statistics which provides the statistics on the variables their categories and the data sets as well as the variables cross statistics with the target Note If your data set contains date or datetime variables automatically generated variables will appear in the statistical reports For more information refer to section Date and Datetime Variables Automatically Generated Variables on page 30 the Model Performance in which you will find the model performance indicators KI and KR the variables contributions and the score detailed statistics the Control for Deviation
206. ile This option allows you to cut the output file in quantiles and to assign to each observation the number of the quantile containing it Approximate quantiles are constructed based on the sorted distribution and the boundaries of predicted scores from the validation sample The score boundaries are used to determine approximate quantiles on the apply data set Notes Exact quantile computation would require a full sort of the scores obtained on the apply data set which can be consuming InfiniteInsight V6 0 offers Gain Chart option for this purpose It appears in the output file as quantile_rr_ lt target variable gt _ lt number of quantiles gt for example for a target variable named class and a number of quantiles equal to 10 the generated column will be named quantile_rr_class_10 1 Check the option Predicted Value Quantiles 2 Inthe field Number of Quantiles enter the number of quantiles you want to create 2 4 2 5 3 3 3 3 Contributions This option allows you to add the variables contributions for the current variable to the output file You can add the contributions of all variables or select only the contributions of specific variables It appears in the output file as contrib_ lt variable gt _rr_ lt target variable gt For example if marital status is an explanatory variable for the target variable class the column contrib_marital status_rr_class will be generated in the output file VI To Add All Variabl
207. imates but also good range estimates 4 8 2 Other Commonly Used Indicators Three other indicators commonly used in Data Mining are provided to assess a nfinite nsighf model the GINI index the K S the AUC SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 40 4 8 2 1 GINI Index The Gini statistic is a measure of predictive power based on the Lorenz curve Itis proportionate to the area between the random line and the Model curve The GINI index is defined as the area under the Lorenz curve page 47 The GINI index is the area between the Trade off curve and the obtained curve multiplied by 2 This is often pictured as the following chart 1 0 8 W 0 4 0 2 0 7 r 0 0 2 04 0 6 08 1 The horizontal axis grows with the score and can be associated with 1 f This is simply expressed using our notations as GINI 2 ja a t a 1 a pg LO pg op GUNT 2 1 pa 5 i auc 1 pg 2AUC 1 Using these notations we know that the GINI index of a random model is O and for a perfect model is 1 Pg 4 8 2 2 K S K S is the Kolmogorov Smirnov statistic applied here as a measure of deviation from uniform response rates across categories of a variable Kolmogorov Smirnov is a non parametric exact goodness of fit statistic based on the maximum deviation between the cumulative and empirical distribution functions In the
208. ing 2013 SAP AG or an SAP affiliate company All rights reserved 235 E 6 2 3 4 5 2 Formulas Category Importance NP BF NC where Pis the Normal Profit BF is the Bin Frequency and NCis the Normalization Constant The calculation of the normalization constant differs by target data type The calculations for binary and continuous targets are detailed below For binary targets it is calculated as follow TARGET FREQUENCY 1 TARGET FREQUENCY It can be approximated for non pathological continuous targets that is continuous targets without distribution peak Dirac from PROPORTION ABOVE MEDIAN 1 PROPORTION ABOVE MEDIAN 0 5 1 0 5 0 25 6 2 3 4 6 Grouping Categories On the plot of details of a variable categories may appear grouped When the option Enable Data Encoding Optimal Grouping for All Variables is enabled nfinite nsight groups those categories sharing the same effect on the target variable For instance for the variable education the categories Doctorate and Prof School are grouped If the explanatory variable is continuous nfinite nsight identifies the points where behavioral changes occur with respect to the target variable and automatically crops the variable into intervals exhibiting homogeneous behavior with respect to the target For more information please see section Optimal Grouping for all Variables aly Category Significance BAO
209. ing the itemset A B C 7 1 1 1 21 confusion matrix The confusion matrix allows visualizing the target values predicted by the model compared with the real values and setting the score above which the observations will be considered as positive that is the observations for which the target value is the one wanted 7 1 1 1 22 consequent Y is called the consequent of the rule The consequent is composed of only one item for example Y can be the item D SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 263 7 1 1 1 23 continuous variable Continuous variables are variables whose values are numerical continuous and sortable Arithmetic operations may be performed on these values such as determination of their sum or their mean 7 1 1 1 24 contribution relative importance of each variable in the built model 7 1 1 1 25 correlation Any measure that quantifies the fact that two variables share the same information This can be measured by looking at the relative variation of the two variables for different entities Classical statistics defines linear correlation to compute such a metrics on continuous variables nfinite nsighf can compute correlations between variables of different types by looking at the correlation of the codes of both variables in presence of a target 7 1 1 1 26 cross statistics A method of estimating the accuracy of a classification or regressio
210. inuous Variables SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 125 5 2 3 5 3 Plot Options VI To Switch Between Validation Data Set and All Data Sets Plots 1 Click the 283 button to display all data sets The plot displaying all data sets will appear all Category Significance ABO Bw Abha s Variables Jage Variable age lhe Influence on Target aes aN g id e e Categories m Estimation Validation SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 126 2 Click the uid button to go back to the Validation Data Set plot Vl To Switch between Curve and Bar Charts 3 Click the button to display the curve chart The curve plot will appear aly Category Significance ABO AmB ASUAS varebles faghS Variable age da E 2 o 3 2 D 2 D a VD oD BV BV BV BV BV gd oD of DV BV BV a Dl IV oA gi gi 4o Ag BP OP at Qt al at Ge Ge GAB Or Gh ge ge Pt gt yt Values E Random W Wizard Estimation Validation Tina 4 Click the aly button to go back to the bar chart Note You can combine the different types of plot For example you can display AIl Datasets in a curve chart or the Validation Data Set ina bar chart SAP Infinitelnsight 6
211. ionscconusacnadtane de cnaulecsesciudeseatbane E EE OAE NaS 17 DS ST se casos seeks cc tec a vac oe ces cetes e aa sacemert bau a EEE aE E E E tates 18 GUITING Strate Gis ws 2c so cca soca Saas hele accent Me beste Salen hace d she Pag T EA E T 19 Table of Data rece cesscieieRiedespstectih aise iniieg E E E hed oh taki ie hie eg apie ees 24 VanableSn norrir eeni e eroa ea a ee eae eae n a ea aae a EA died ee eden ne 25 eTe eI inc E E E E E E E A shay di cedlaaa sds dap iascaaiaaesanndune giuduntanaumeccandanssauaeege 34 Performance Indicators 2 cccccccceeeeeeeeeeeeeee cece eeeeaeeeeeeeeeegeaaeeeeeeecaaaaaaaeeeeeeeeaaaaaeeeeeeeesgeaaaeeeeeeeeseseqeeeeeseeeeeseeeaaes 38 lAIN A TYG icc een ca chicas ic fee O E A EE A sande Posie cea axubantacclescndnadienieses tanssadencto E E A E ATTE 45 Advanced Model Curve we zerserecissczscies sceecestieadsecesteavonee saat iceagsaesdudveeiecvetcesetdautvk beacd auneaeedeedawetaaeseeaaveyieesdeaeetiseeeds 46 SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 16 i a 5 Fi 4 1 Operation of Infinitelnsight Overview Infinitelnsight allows you to perform supervised Data Mining that is to transform your data into knowledge then into action as a function of a domain specific business issue nfinitelnsight supports various formats of source data flat files ODBC compatible sources In order to be usable by nfinite nsight feature
212. iption have been entered you must select the following variables one or more Targets Variables on page 31 possibly a Weight Variable on page 34 the Explanatory Variables on page 32 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 89 5 2 1 5 1 Target Variables E For this Scenario Select the variable Class as your target variable that is the variable that indicates the probability of an individual responding in a positive or negative manner to your campaign M To Select Targets Variables 1 Onthe screen Selecting Variables in the section Explanatory variables selected left hand side select the variables you want to use as Target Variables lt KXEN InfiniteInsight New Regression Classification Model Selecting Variables Explanatory Variables Selected Target Variables ass Weight Variable gt Exduded Variables gt dex lt I Alphabetic Sort Number of Variables 14 I A 7 Alphabetic Sort H OT Alphabetic Sort Gr un Ga Note On the screen Selecting Variables variables are presented in the same order as that in which they appear in the table of data To sort them alphabetically select the option Alphabetic sort presented beneath each of the variables list Click the button gt located on the left of the screen section Target s Variable s upper right hand side The
213. isin annie an nine einai 35 4 7 4 Generating a Model c tseicta en ae Sa ae ee Sa ERG oe a aa e aa es 35 SAP Infinitelnsight 6 5 SP4 CUSTOMER How to Use this Document 2013 SAP AG or an SAP affiliate company All rights reserved II 4 7 5 Representation of a Model oo ec ceccceceteeceeteeteeneenecneetecneetesnetiesnesiesnesesecesaesesesesseseseseseseeaees 36 4 7 6 Validating the Model ne incat ain tative Waa hs en er a r 37 4 7 7 Under what Circumstances is a Model Acceptable cc ccccceececeeeeeeeeeeeeeeeeeeeeeeneeeeeeesaeeireeeeeatens 37 4 7 8 How to Obtain a Better Mod lince cath ei ne ee ee eee eee 38 48 Performance INGICatOrs sever a travien cnn anil anenad ated anal dad naaad aay 38 4 8 1 Indicators Specific to IntinitelnSignt ac aeanan shan AGA a bed debe 38 4 8 2 Other Commonly Used Indicators iadietcedeedendisindentcsdinienteiondiniadinientieninnnn TTR 40 4 8 3 Err rlndicator Suain ninn e iiei i ir ee Ea Ee C T aaa eae 42 4 9 Proy DE reae a a a a a A A a est Gece mete 45 4 9 1 Definit e en eea we a eree a EEEE A ETE AEAEE A A nadine dean Ea 45 4 9 2 Available Profit Types 0 cccecceeseseseeeeeeeeeeeeeceeeeeeeeeecaesseeessessessecsessesesseseessessessessesessesesesesseseeaeeaes 45 4 10 Advanced Model Curve cceeccccceceseeeenecneeneenectececesseeecesessecessesessesessaeeaessessessesseeaecsecsesaesieeaeeaesaesaeeaesaeeas 46 ROT REREH OOA ET A EE AEEA E T 46 4110 2537 Lorenz CUIVES airain a a dM aaeeei na eaaeo nena
214. isk Mode allows advanced users to ask a classification model to translate its internal equation obtained with no constraints into a specified range of scores associated with good bad odds ratio When this mode is activated the different encodings that are used internally for continuous and ordinal variables are merged in a single representation allowing a simpler view of the model internal equations This is particularly useful when the usage of predictive model is subject to legal restrictions the model equations are now simple enough to be understood by legal departments and can be exposed not only in programming language as it was already the case before but even in simple words The underlying technology is also used to display so called score cards To use this mode you need to choose a Risk Score associated with a Good Bad Odds ratio Definition of odds ratio the ratio between good and bad i e 1 p p where p is the probability of risk the number of Points to double the odds Definition of points to double the odd PDO the number of risk points needed to double the odds ratio SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 103 For example considering a Risk Score equal to 615 an odds ratio of 9 1 and 15 points to double the odds In this case Infinitelnsight will automatically re scale the inter
215. isplay a background make transparent color Edit Configuration font size Check the option Dynamically render option font style changes or click Apply when editing the settings so font color that you can visualize the result text background color table configuration The selected settings will be applied to both the wizard and the generated reports V1 To Edit the Charts Settings Settings Options Note Chart Colors modify the charts colors Default Chart Bars Orientation horizontal It is possible to set another default vertical orientation for specific report items M To Edit Report Items 1 Set the properties of your choice 2 Click Save to validate A window opens indicating that your style sheet has been successfully saved 3 Click OK Properties Functions Note Displayed as name of the label View Type choose between Tabular HTML and Graphical The last one is only available if the report item can be displayed as a graph Chart Type select one of the proposed chart types This option is only available for report items of the view type Graphical Switch Bar this option allows having another bar orientation as the Orientation default one for a specific report item Sort by Sort Order you can select a column to sort by and choose between an ascending or a descending order Visibility you can hide columns of a report item or even menu items At least one column of a report item must remain visible
216. isplayed Target Key wanted target value lt NonTargetCategory gt frequency in percentage of the non target value in the entire data set Frequency lt TargetCategory gt Frequency frequency in percentage of the wanted target value in the entire data set 6 2 3 2 3 Continuous Targets Number For each continuous target lt TargetVariableNam name of the target variable for which the statistics are displayed e gt Min minimum value for the target Max maximum value for the target Mean mean of the target Standard Deviation mean of the distance between the target values and the Mean 6 2 3 2 4 Performance Indicators For each target variable KI KXEN Information Indicator For more information on KI see section Performance indicators on page 38 KR KXEN Robustness Indicator For more information on KR see section Performance indicators on page 38 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 225 6 2 3 2 5 Clusters Counts For each target variable Requested Number of number of clusters that have been asked for by the user Clusters Effective Number of Clusters number of clusters found by the model 6 2 3 2 6 Model Overview Options VI To Copy the Model Overview Click the Copy button The application copies the HTML code of the screen You can paste into a word processing or spreadsheet program a text
217. ist Data Type to select the format of the filter Use the Browse button located on the right of the Folder field to select the folder or the database in which the filter is stored 4 Use the Browse button located on the right of the Description field to select the file or the table containing the filter 5 Click the OK button 5 2 1 4 Translating the Variable Categories You can translate the categories of a nominal variable save the translation or load an existing translation This translation has no influence on the variable structure which has to be set according to the original values of the variable Note The variable Target Key that is used in the advanced settings for example does not take into account the translation when displaying the possible values of this variable I To Translate the Variable Categories 1 Right click a nominal variable to translate its categories A contextual menu is displayed 2 Select the option Translate Categories for lt name_of_the_variable gt 3 Choose into which languages you want to translate By default the language of the user interface is displayed as acolumn Click the button to extract the variable categories from the data set 5 Translate the categories Note You do not need to fill all fields SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 88 18 19 20
218. iteInsight offers Gain Chart option for this purpose It appears in the output file as quantile_rr_ lt target variable gt _ lt number of quantiles gt for example for a target variable named class and a number of quantiles equal to 10 the generated column will be named quantile_rr_class_10 1 Check the option Predicted Value Quantiles 2 Inthe field Number of Quantiles enter the number of quantiles you want to create Check the option Predicted Value Quantiles 5 2 4 2 5 3 2 4 Contributions This option allows you to add the variables contributions for the current variable to the output file You can add the contributions of all variables or select only the contributions of specific variables It appears in the output file as contrib_ lt variable gt _rr_ lt target variable gt For example if marital status is an explanatory variable for the target variable class the column contrib_marital status_rr_class will be generated in the output file VI To Add All Variables Contributions Check the All option MV To Add Specific Variable Contributions Check the Individual option Click the gt gt button to display the variable selection table In the Available list select the variables you want to add use the Ctrl key to select more than one variable Click the gt button to add the selected variables to the Selected list amp OND 5 2 4 2 5 3 3 Nominal Target 5 2 4 2 5 3 3 1 Outputs by Rank 5 2 4 2 5 3 3 11 Scores
219. ith Problems on Apply in Data Set Deviation Detailed Statistics E Probability of Deviation on Apply in Data Set E Probability of Category Deviation on Apply in Data Set E Probability of Target Deviation on Apply in Data Set E Probability of Grouped Category Deviation on Apply in Data Set Detailed Statistics on Control Group H seve te revers 5 2 4 1 3 2 1 Options You can select which report sections to save 1 Click the button h Save the reports located in the bottom left corner A selector window opens lt Select the Report Sections to be Saved ixi IV amp Control for Deviations on Apply in H V amp Deviation Summary E Category with Problems on Apply in Data Set V amp Deviation Detailed Statistics a IV E Probability of Deviation on Apply in Data Set H B Probability of Category Deviation on Apply in Data Set H i Probability of Target Deviation on Apply in Data Set i 7 E Probability of Grouped Category Deviation on Apply in Data Set V amp Detailed Statistics on Control Group i V B Category Frequencies on Control Group Report Style Textual x In the displayed list check the sections you want to save 3 Inthe list Report Style select the type of output you want Three styles of output are available Automatic saves the default view displayed in the interface Graphical saves the re
220. ithout distribution peak Dirac as Z P S gt median S 1 P S gt median S In most cases a good approximation is 0 25 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 131 8 oa 5 2 3 5 5 2 3 Normal Profit Properties There are several interesting things to note about normal profit 1 The normal profit of category is independent of the target values themselves user can change the target value through monotonic transformations the normal profit of the categories with respect to this target will not change This belongs to non parametric metrics 2 Aconsequence of lis that this metric is resistant to outliers when there are a few occurrences of the target with very high values with respect to the rest of the target value distributions the notion of normal profit is not impacted 3 The weighted sum of the normal profit for all categories of a given variables will always be O 5 2 3 5 6 Grouping Categories On the plot of details of a variable categories may appear grouped When the option Enable Data Encoding Optimal Grouping for All Variables is enabled nfinite nsight groups those categories sharing the same effect on the target variable For instance for the variable education the categories Doctorate and Prof School are grouped If the explanatory variable is continuous nfinite nsight ident
221. ize how much better your model is compared with the random model SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 45 4 9 2 3 Standardized Profit Standardized profit allows examination of the contribution of the model generated by nfinite nsight features relative to a model of random type that is in comparison with a model that would only allow selecting observations at random from your database This profit is used for the plots of variable details which present the significance of each of the categories of a given variable with respect to the target variable 4 9 2 4 Customized Profit Customized profit allows you to define your own profit values that is to associate both a cost and a benefit to each value of the target variable For instance you can define the cost of sending out a mailing as a negative value for example 5 the benefit brought in by the response to that mailing as a positive value for example 20 4 10 Advanced Model Curves In addition to the profit curves detailed in the previous section a series of advanced model curves are available in Infinitelnsight 4 10 1 ROC The ROC Receiver Operating Characteristic graph is derived from signal detection theory It portrays how well a model discriminates in terms of the tradeoff between sensitivity and specificity or in effect between correct and mistaken detection as the d
222. k other indicators provided in addition to KI and KR during the model generation For example you could view the total elapsed time required to generate the model and information on the standard error rate I To Generate a New Model You have two options On the screen Training the Model you can Either click the Previous button to return to the modeling parameters defined initially Then you can modify the parameters one by one Or click the Cancel button to return to the main screen of nfinite nsight Then you must redefine all the modeling parameters 5 2 3 Step 3 Analyzing and Understanding the Model Generated The suite of plotting tools within nfinite nsight allows you to analyze and understand the model generated The performance of the model with respect to a hypothetical perfect model and a random type of model The contribution of each of the explanatory variables with respect to the target variable The significance of the various categories of each variable with respect to the target variable 5 2 3 1 Presentation of the Infinitelnsight User Menu Once the model has been generated click the Next button The screen Using the Model will appear KXEN InfiniteInsight class_Census01 OS Using the Model Wi Display Model Overview Model Graphs Contributions by Variables Category Significance Statistical Reports Scorecard Confusion Matrix Run Analyze Deviations Simulation Save Exp
223. ked categories although they happen to be represented by numbers The variable eye color is a nominal variable The set of values that this variable may assume blue brown black for example are clearly distinct non ordered categories and are represented by character strings 4 6 3 3 3 Nominal Variables and Modeling During modeling the values of the categorical variables are regrouped into homogeneous categories These categories are then ordered as a function of their relative contribution with respect to the values of the target variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 28 4 6 4 Storage Formats To describe the data nfinite nsight uses four types of storage formats date datetime number integer string The following table describes these storage formats The storage format date datetime number integer string Is used to describe variables when their values correspond to Dates expressed in the following formats YYYY MM DD YYYY MM DD Dates and times expressed in the following formats YYYY MM DD HH MM SS YYYY MM DD HH MM SS Figures or numerical values on which operations may be performed Figures or numerical integer values on which operations may be performed Alphanumeric character strings For instance 2001 11 30 1999 04 28 2001 11
224. kely to respond in a positive manner to your marketing campaign relative to the entire set of prospects contained in your database Detected profit is the default setting for type of profit Using this type of profit The value 0 is assigned to observations that do not belong to the target category of the target variable The value 1 frequency of the target variable in the data set is assigned to observations that do belong to the target category of the target variable The following table describes the three curves represented on the plot created using the default parameters The curve Wizard green curve at the top Validation blue curve in the middle Random red curve at the bottom Represents The profit that may be achieved using the hypothetical perfect model that allows one to know with absolute confidence the value of the target variable for each observation of the data set The profit that may be achieved using the model generated by Infinitelnsight Modeler Regression Classification that allows one to perform the best possible prediction of the value of the target variable for each observation of the data set The profit that may be achieved using a random model that does not allow one to know even a single value of the target variable for each observation of the data set SAP Infinitelnsight 6 5 SP4 Infinitelnsight Modeler Regression Classification 2013 SAP AG or an
225. key Wanted target value Target categories Percentage of all the target value in the Estimation data set when dealing Frequency with a nominal target For each continuous target variable lt TargetName gt Name of the target variable Min Minimum value found for the target variable in the Estimation data set Max Maximum value found for the target variable in the Estimation data set Mean Mean of the target variable values on the Estimation data set Standard deviation Mean of the distance between the target values and the Mean 5 2 3 2 4 Performance Indicators For each target rr_ lt TargetName Target name a Note rr_ indicates a regression classification and kc_ indicates a kc_ lt TargetName segmentation clustering gt KI KXEN Information Indicator Quality indicator that corresponds to the proportion of information contained in the target variable that the explanatory variables are able to explain KR KXEN Robustness Indicator Robustness indicator that signifies the capacity of the model to achieve the same performance when it is applied to a new data set exhibiting the same characteristics as the training data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 111 5 2 3 2 5 Options Vi To Copy the Model Summary Click the EA Copy button The application copies the HTML code of the screen You can paste int
226. ks the statistical expert will be able to provide a certain number of clusters or homogeneous groups to which each of the individuals of your database are assigned This method presents significant constraints You must Ensure that your statistical expert who is usually from an external department is available for the scheduled period Ensure that the modeling costs will fit into your budget Spend time explaining your domain specific business issue to the statistician Spend time understanding the results that are provided Ask a programmer to write a program to determine the cluster associated with any new individual added to your database In addition this method is not systematic Two statisticians performing this segmentation on the same data set could obtain different results 6 1 5 3 KXEN Method Intinitelnsight Modeler Segmentation Clustering allows you to build a segmentation model of your customers in a few minutes taking into consideration the interest expressed by your customers in your new product Intinitelnsight Modeler Segmentation Clustering automatically detects interactions between the variables to build homogeneous sub sets or clusters Each cluster is homogeneous with respect to the entire set of variables and in particular with respect to the target variable that is for example responded positively to my test You will discover the characteristics of different clusters such as those
227. lain the target variable when applied to the training data set A perfect model possesses a KI equal to 1 and a completely random model possesses a KI equal to O The robustness indicator KR defines the degree of robustness of the model that is its capacity to achieve the same explanatory power when applied to a new data set In other words the degree of robustness corresponds to the predictive power of the model when applied to an application data set To discover how the indicators KI and KR are calculated KI KR and Model Curves on page 40 Note Validation of the model is a critically important phase in the overall process of Data Mining Always be sure to assign significant importance to the values obtained for the KI and KR of a model 4 7 7Under what Circumstances is a Model Acceptable 4 7 7 1 Quality Indicator KI No minimum threshold is required for the KI of a model This depends upon the context of your work that is your domain of application the nature of your data and your business issue In some cases a model with a KI of as low as 0 1 may allow realization of a profit of several thousands dollars In all cases a positive KI indicates that the model generated will perform better than a random model 4 7 7 2 Robustness Indicator KR A model with a KR less than 0 95 must be considered with caution The performance of such a model is very likely to vary between the training data set and the application data sets SA
228. lating to the model generated Display section referring to the model curve plots plotting of clusters contributions by variables and the profiles of variables of each cluster Apply the model generated to new data Run section Save the model or generate the source code Save Export section 6 2 3 2 Model Overview The screen Model Overview displays the same information as the training summary 6 2 3 2 1 Overview Model name of the model created from the target variable name and the data set name Data Set name of the data set Initial Number of Variables number of variables in the data set Number of Selected number of explanatory variables used to build the model Variables Number of Records number of records in the data set Building Date date and time when the model was built Learning Time total learning time Engine name name of the KXEN feature used to build the model Kxen KMeans for a segmentation Requested Number of number of clusters that have been asked for by the user Clusters SQL Expressions indicates if the SQL expressions for the clusters definitions have been calculated Enabled or not Disabled SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 224 6 2 3 2 2 Nominal Target Variables For each nominal target lt TargetVariableName gt name of the target variable for which the statistics are d
229. llows you to generate in the output file the probability of the best decisions for each observation Like for the previous options the scores obtained for each category of the target variable are compared and the probability of the category with the best score for the current record is displayed in the column proba_rr_ lt Target Variable gt lf several decisions have been requested the probability of the category with the second best score is displayed in the column proba_rr_ lt Target Variable gt _2 the one with the third best score in the column proba_rr_ lt Target Variable gt _3 and so on 5 2 4 2 5 3 3 2 Outputs by Reference Category 2 4 25 3 3 21 Score This option allows you to generate in the output file the score corresponding to each data set line for the different categories of the target variable You can generate the scores for all the target variable categories or select specific categories It appears in the output file as rr_ lt Target Variable gt for the target variable key category and rr_ lt Target Variable gt _ lt Category gt for its other categories M To Add the Score of All Target Variable Categories Check the All option VI To Add Only the Scores of Selected Categories 1 Check the Individual option 2 Inthe Selection column check the boxes corresponding to the categories for which you want to add the score in the output file 24 2 5 3 3 2 2 Prediction Probability This option allows you to generate in the
230. lnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 247 6 2 3 6 5 Understanding SQL Expressions The SQL Expressions screen can be broken down into two parts n the upper part a table presents each cluster in a summarized fashion It allows you to select the cluster for which you want to display a SQL expression n the lower part a tree presents the SQL expression corresponding to the selected cluster The following schema presents the SQL expression for Cluster 1 Me KXEN InfiniteInsight class_Census01 aly Cluster Profiles At ASERS O Cluster Index Frequencies N 10 4 17 65 18 0 3 7 3 5 5 3 11 3 17 0 4 7 D D on d Wo 5 Variable ranges for Cluster 1 AND 1 59 capital gain in 44650 99999 AND 4 67 education in Bachelors Doctorate Prof school Masters felationship in Wife Husband AND 19 39 relationship in Husband gt occupation in Adm clerical Machine op inspct Protective serv Tech support Farming fishing Transport moving Craft repair Sale gt education in 10th Some college HS grad 7th 8th Assoc voc Assoc acdm 9th 11th Preschool 1st 4th 5th 6th 12th gt hours per week in 35 99 age in 31 32 33 34 35 36 37 38 39 40 41 43 44 45 46 90
231. lytical model that is generated from a data set of 50 lines may have low generalization capacity and contain low informative value KXEN can advise you on the issues of data volume Your data set must contain a target variable that will allow expression of your business issue within Infinitelnsight The target variable must be known for each observation of the training data set To express this another way no target variable values may be missing over the range of the entire training data set The data source format must be supported by nfinite nsight Your data must be presented in the form of a single table of data except in instances where you are using the nfinitelnsight Explorer Event Logging or Infinitelnsight Explorer Sequence Coding features SAP Infinitelnsight 6 5 SP4 CUSTOMER SAP Infinitelnsight 2013 SAP AG or an SAP affiliate company All rights reserved 15 4 Essential Concepts This section introduces the essential concepts relating to use of nfinite nsight All concepts are introduced and appear in boldface in the section Operation of nfinite nsight Overview which provides a general description of the use of nfinite nsight in the model generation process IN THIS CHAPTER Operation of Infinitelnsight Overview ceecceeecceeneeeeneeteeeeeeeeseaeeeeaeeceaeesaeeseaeeeeaeeseaeeseaeeseaeeseaeeseaeeenaeessaeetaaee 17 Data Sources Supported o icie cscictksecheccegcestaed necaedeciaacechdnennu
232. lytical tools he will apply Test different types of algorithms for example neural networks Bayesian networks logistic models decision trees and select the one best suited to your business issue Typically after a few weeks the statistician will be able to associate a value with each individual in your database indicating the probability of being interested or not interested in your marketing campaign This method presents significant constraints You must Ensure that your statistical expert perhaps from a department external to the Marketing Department is available for the scheduled period Ensure that the cost for using this scarce resource will fit into your budget Spend time explaining your domain specific business issue to him Spend time understanding the results that are provided SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 57 5 1 6 4 KXEN Method The simplicity and automatic nature of nfinite nsight will allow you to perform the statistical analysis of your database yourself In addition using nfinite nsight will allow you to obtain results in mere minutes Infinitelnsight uses the latest innovations of statistical sciences and also liberates the end user from the complexity of the procedures associated with statistical analysis Using nfinite nsight you will be able to create a model that allo
233. me format yyyy MM DD HH MM SS O the variable is not an identifier 1 primary identifier 2 secondary identifier Click the Add button A pop up window opens allowing you to set the constant parameters In the field Output Name enter the constant name Click the OK button to create the constant The new constant appears in the list You can choose whether to generate the defined constants or not by checking the Visibility box SAP Infinitelnsight 6 5 SP4 Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 254 CUSTOMER 6 2 4 1 3 4 1 Top Ranking Centroids Indices This option allows you to add to the output file the number of the clusters whose centroids are the closest to the current observation The closest cluster is the one the observation belongs to its number is displayed in the column kc_ lt Target variable gt The next closest cluster is displayed in the column kc_ lt Target Variable gt _2 and so on until the furthest cluster you can choose to add all the clusters or only the closest M To Add All the Clusters Check the All option I To Add Only the Closest Clusters 1 Check the Top option 2 Inthe text field enter the number of clusters you want to add for example the two three or four closest 6 2 4 1 3 4 2 Top Ranking Centroids Names This option allows you to add to the output file the names of the clusters whose centroids are the closest to t
234. model including the target variable which must be filled VI To Select a Data Set 1 Onthe screen Analyze Deviations select the data source format to be used Text file ODBC 2 Click the Browse button The following selection dialog box will appear lt Data Source Selection x Select Source Folder for Data a C Users denise ortiz caso Documents Om CJ Census JapaneseData KAR CO KelData H m KA in 7 Samples Text Files dat data csv txt User Password Lox ce 3 Select the file you want to use then click OK The name of the file will appear in the Data Set field 4 Click the Next button The screen Deviation Analysis Debriefing is displayed SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 151 5 2 4 1 2 Following the Deviations Analysis Progress The panel Deviation Analysis Debriefing allows you to follow the analysis process thanks to a progression bar a Analyze Deviations Debriefing a Stop Current Task At the end of the process a debriefing panel is displayed For details on the debriefing panel see section Understanding the Deviation Analysis on page 153 You can use the toolbar provided on the upper part of the panel to ce stop the analysis process by clicking the i
235. mpany All rights reserved 144 5 2 5 10 2 1 The Decision Tree Each node in the tree displays the name of the expanded variable for example MARITAL STATUS the categories on which the node population has been filtered for example MARRIED AF SPOUSE NEVER MARRIED the Population of the node the ratio of Positive Target for nominal targets or the Target Mean for continuous targets Example for Example for a nominal target a continuous target marital status workclass Married AF spouse M Federal gov Local g Population 22416 Population 12106 Positive Target 4 6 Target Mean 43 172 When you go over anode several options are offered x Select the variable to be used to expand the next level of the decision tree Automatically expand the next level using the most contributive variable not yet used in the current decision tree aS Fold the section of the tree displayed below the current node The thickness of the arrows depends on the amount of population in the node In the following example the arrow leading to the node corresponding to the category 0 4386 of CAPITAL GAIN is thicker since the node population is significantly higher than the one from the node CAPITAL GAIN 4386 41310 oyMarried sp Population 8779 Positive Target I 9 76 Population 22416 Positive Target JES 6 Positive Target 4 71 capital gain capital gain 0 4386 1
236. n This translation has no influence on the variable structure which has to be set according to the original values of the variable Note The variable Target Key that is used in the advanced settings for example does not take into account the translation when displaying the possible values of this variable M To Translate the Variable Categories 1 Right click a nominal variable to translate its categories A contextual menu is displayed 2 Select the option Translate Categories for lt name_of_the_variable gt 3 Choose into which languages you want to translate By default the language of the user interface is displayed as acolumn Click the button to extract the variable categories from the data set 5 Translate the categories Note You do not need to fill all fields SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 208 18 19 20 21 22 23 24 25 26 27 Click the OK button To Save the Categories Translation Translate the variable categories as described above Click the Save button Choose a Data Type Select a Folder Enter a Name for the file or table Click the OK button To Load an Existing Translation File Right click a nominal variable A contextual menu is displayed Select the option Translate Categories for lt name_of_the_variable gt Click the Load button Select the
237. n each cluster The variables corresponding to each cluster and an indication of the encoding disjunction of the cluster numbers The names of these variables correspond to cluster numbers prefixed by kc_cluster_ for example kc_cluster_1 for cluster 1 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 259 7 Glossary 7 1 1 1 1 analytical data management The analytical data management as defined by KXEN is made of three elements the data manipulation functions offered by KXEN such as filters join attributes new attributes see attribute on page 261 computation aggregates page 265 performance indicators page 273 definition the KXEN analytical data set methodology the meta data management which allows storing sharing and easily re using the data descriptions 7 1 1 1 2 analytical data set Tabular representation of data made of lines and columns Each line represents an observation Roles page 276 can be assigned to columns such as Input skip target or weight 7 1 1 1 3 analytical record An analytical record is a logical view of all attributes see attribute on page 261 corresponding to an entity page 266 An analytical record may be decomposed into domains page 266 that group attributes related to each other for example in CRM an analytical record can have a dem
238. n model The data set is divided into several parts with each part in turn used to test a model fitted to the remaining parts 7 1 1 1 27 customized cutting strategy The customized cutting strategy allows you to define your own data sub sets To use this strategy you must have prepared before opening nfinite nsight features three sub sets the estimation validation and test sub sets 7 1 1 1 28 customized profit Customized profit allows you to define your own profit values that is to associate both a cost and a benefit to each value of the target variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 264 7 1 1 1 29 cutting strategy A cutting strategy is a technique that allows decomposition of a training data set into two or three distinct sub sets Anestimation sub set A validation sub set Atest sub set This cutting allows for cross validation of the models generated D 7 1 1 1 30 data aggregation The process of consolidating data values into a smaller number of values For example sales data could be collected on a daily basis and then be totaled to the week level 7 1 1 1 31 data set a collection of data usually presented in tabular form where each column represents a particular variable and where each row is an assignment of values 7 1 1 1 32 data source A data source includes both the source of data itself such as relational
239. n the model On this plot each bar represents the contribution of an explanatory variable with respect to the target variable The following four types of plots allow you to visualize contributions by variables Variable Contributions that is relative importance of each variable in the built model Variable Weights that is weights in the final polynomial of the normalized variables Smart Variable Contributions that is the variables internal contributions Maximum Smart Variable Contributions that is the maximum smart variable contributions including only the maximum of similar variables For example only binned encoding of the continuous variable age will be displayed This is the chart displayed by default SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 121 5 2 3 4 2 Displaying Contributions by Variables M To Display the Plot of Contributions by Variables 1 Onthe screen Using the Model click the option Contributions by Variables The plot Contributions by Variables will appear The default plot type is Maximum Smart Variable Contributions alli Contributions by Variables DHAS Chart Type Maximum Smart Variable Contributions Maximum Smart Variable Contributions 0 000 0 025 0 050 0 075 0 100 0 125 0 150 0 175 0 200 0 225 0 250 ji marital status capital gain occupation f education num capital
240. na TEn 47 4103 Density CURVES teana e anit ete en bad O 48 AOA aE O ET E A A T E S E E E E E A E 50 5 Infinitelnsight Modeler Regression Classification cseeccssscseeeessnesseeeessneeseeeessneeseneessneesneeessnaees 53 5 1 Application Scenario Enhance Efficiency and Master your Budget using Modeling 53 5d PRESENTA OM iiini Mureni i vad tig added dad fi diver Aen ee eed eee dete 53 5 1 2 YOUP OBJECTIVE nnne erona ie etot EEr EERE E TA EE EEA A E A dee alae 53 5 1 3 YOUR MEANS sisiiesiesisraineatisesrssareprespiisesaseissps raii saresei seps peiin sassi t inan dapa tasinaas asas data sasidan s aiassa andana diadaiasdiada ad 54 5 1 4 YOUrAPPlLOdChivscny ne n iriti iein irie inea iei Aaa iao ri Ae Ee ATEA AEE 55 5 1 5 Your BUSINESS ISSUE ou eeeeceeceeeeeeeeeeeeeeeeeeeeceecececececeesesnesesesesnesieseeeseecesiesesesiesiesiesiesieeiesieeeeaeeats 56 5 1 6 YOUr SOLON ene aa haereeneten ete vases ete Ate cal that dat ei ta 56 5 1 7 Introduction to Sample Files cecsceseseeseeeeeceeneeneeeeecneseccesececesessessesesaesessessessesesaeseseeaeeas 59 5 1 8 IATIMITS ISIS eienenn iinne ea heen S EE AE Ea EANNA EAEan rea E S inora 60 5 2 Creating a Classification Model Using Infinitelnsight Modeler ccccececeeceeeeeeeeeneeeeeneeneeneeneeneseeeseeneeaeeaes 64 5 2 1 Step 1 Defining the Modeling Parameters c eccesesceeseeseeeeeeceeeeeeeeeeeeetseesesaessesaesesesaeseeeaeeaeeaes 64 5 2 2 Step 2 Generating and Validating th
241. nal scores to scores in Risk Mode space and associates an odds ratio to each score in Risk Mode space E In this Scenario Do not activate the Risk mode MI To Define the Risk Mode Parameters 1 Inthe list Learning Mode select the option Risk Mode The Risk Mode Settings are displayed General Auto selection Learning Mode Gain Chart F Enable Specific Learning Mode Learning Mode Risk Mode Risk Mode SettifRule Mode Risk Mode Risk Score 615 for good bad odds ratio of 9 tol Points to double odds 15 View Score Table In the field Risk Score enter the score you want to associate with a good bad odds ratio In the field for good bad odds ratio of enter the ratio Indicate the increase of score points needed to double the odds in the field Points to double odds Click the button View Score Table to display the table of scores associated with the corresponding good bad odds ratio a Ae OMN K Yiew Score Table x Odds Ratio 5 2 1 7 3 2 1 Risk Fitting Domain This option allows the user to control the way risk score fitting is performed that is how Infinitelnsight fits its own scores to the risk scores The risk fitting has two modes PDO based the area equals MEDIAN SCORE N PDO MEDIAN SCORE N PDO N number of PDOs around the median score must be specified by the user By default Nis set to 2 Note PDO sta
242. nalyzing and Understanding the Model Generated page 109 and then apply it to new data sets see Step 4 Using the Model page 150 Otherwise you can modify the modeling parameters in such a way that they are better suited to your data set and your business issue and then generate new more powerful models SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 105 SSS ws 5 2 2 1 Generating the Model VI To Generate the Model 1 Onthe screen Specific Parameters of the Model click the Generate button The screen Training the Model will appear The model is being generated A progress bar will allow you to follow the process Training the Model a OB BSB Computing statistics Stop Current Task 2 If the Autosave option has been activated in the panel Summary of Modeling Parameters a warning message is displayed at the end of the learning process confirming that the model has been saved An Error Occurred xj 3 Click Close 4 Once the model has been generated click Next to go to panel Using the Model SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 106 5 2 2 2 Following the Progress of the Generation Process There is two ways for you to follow the progress of the generation process
243. nds for Points to double the odds SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 104 Frequency based the areaequals QUANTILE FREQ QUANTILE 1 0 FREQ The frequency of higher and lower scores to be skipped must be specified by the user By default the frequency is set to 15 If you do not check the box Risk Fitting Domain the mode Frequency based will be used by default The fitting can be weighted or not M To Set the Risk Fitting Parameters 1 Check the box Risk Fitting Domain for good bad odds ratio of R Risk Fitting Domain PDO based Number of PDOs around median score Frequency Based Percentage of lower and higher scores to skip E Use Score Bin Frequency as Weights 2 Select the mode you want to use 3 Depending on the selected mode set the appropriate value in the corresponding field 4 Ifyou want to use weights for the fitting check the box Use Score Bin Frequency as Weights 5 2 2 Step 2 Generating and Validating the Model Once the modeling parameters are defined you can generate the model Then you must validate its performance using the quality indicator KI and the robustness indicator KR If the model is sufficiently powerful you can analyze the responses that it provides in relation to your business issue see Step 3 A
244. nerate Source Code Export KxShell Script Save Model SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 189 6 Infinitelnsight Modeler Segmentation Clustering IN THIS CHAPTER Application Scenario Customize your Communications using Data Modeling ccceseeeeereeeeeeeeeeteneeeenees 190 Creating a Clustering Model Using Infinitelnsight Modeler eceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeneeeeeeeeeeseneeesieeeeeeeeeee 199 6 1 Application Scenario Customize your Communications using Data Modeling In this scenario you are the Marketing Director of a large retail bank The bank wants to offer a new financial product to its customers Your project consists of launching a direct marketing campaign aimed at promoting this product In order to customize the marketing messages from the bank and improve communication with the various customers and prospects for this new product the senior management of the bank asks you to build a segmentation model of the customers of this product Using nfinitelnsight Modeler Segmentation Clustering you can rapidly develop a descriptive model with the least possible cost This model shows the characteristic profiles of the customers interested in your new product and thus responds to your business issue and fulfills your objectives 6 1 1 Presentation This scenario develops logi
245. new data set In other words the degree of robustness corresponds to the predictive power of the model applied to an application data set For more details on the model results see section Model Summary E For this Scenario The model generated possesses A quality indicator KI equal to 0 808 A robustness indicator KR equal to 0 992 The model performs sufficiently well You do not need to generate another VI To Validate the Model Generated 1 Verify the quality indicator KI and robustness indicator KR of the model These indicators are marked in red on the following figures KXEN InfiniteInsight class_Census01 Model Overview Report Type Model Overview Overview Model class_Census01 DataSet Census01 csv Initial Number of Variables 16 Number of Selected Variables 14 48842 2012 07 13 17 30 29 E 13s Kxen RobustRegression z denise ortiz caso Nominal Targets class TargetKey 1 0 Frequency 76 05 1 Frequency 23 95 Selection Process Last Iteration 4 KI 0 805 KR 0 996 Nb Variables Cancel Previous a If the performance of the model meets your requirements go to Step 3 Analyzing and Understanding the Model Generated page 109 b Otherwise go to the procedure To Generate a New Model SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 108 2 You can also chec
246. nfinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 124 5 2 3 5 2 Displaying the Significance of Categories Plot M To Display the Significance of Categories Plot 1 Onthe screen Using the Model click Category Significance The plot Category Significance will appear aly Category Significance B2HA0 Ba Abbas Variables fage Variable age Influence on Target as mo an eS w Kai ee Categories m Validation 2 Inthe Variables list located above the plot select the variable for which you want to display the categories If your data set contains date or datetime variables automatically generated variables can appear in the Variables list For more information refer to section Date and Datetime Variables Automatically Generated Variables on page 30 Notes You can display the relative significance of the categories of a variable directly from the plot Contributions by Variables On the plot Contributions by Variables double click the bar of the variable which interests you In case no user structure has been defined for a continuous variable the plot category significance displays the categories created automatically using the band count parameter The number of categories displayed corresponds to the value of the band count parameter For more information about configuring this parameter please refer to the section Band Count for Cont
247. ng message is displayed at the end of the learning process confirming that the model has been saved An Error Occurred xj 11 4 11 11 The model dass_Census01 has been saved e model dass_Census0 1 has been saved 3 Click Close 4 Once the model has been generated click Next to go to panel Using the Model 6 2 2 2 Following the Progress of the Generation Process There is two ways for you to follow the progress of the generation process The Progress Bar displays the progression for each step of the process It is the screen displayed by default The Detailed Log displays the details of each step of the process MI To display the Progression Bar Click the Show Progression button The progression bar screen appears SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 218 M To Display the Detailed Log Click the Show Detailed Log button The following screen appears Training the Model a 8 D amp Computing statistics Statistics for discrete target class On Estimation 0 is found 27667 times On Estimation 1 is found 8714 times On Validation 0 is found 9488 times On Validation 1 is found 2973 times ariable workclass compression on estimation from 9 to 8 categories ariable workclass compression on validation from 9 to 8 categories compression on estimation from
248. ng Interface allows you to connect proprietary format sources such as industrial sensor streams In most cases and particularly if you are using nfinite nsight features via a graphical interface you never have to concern yourself with the data access process Data access is accomplished in a semi transparent manner from the graphical user interface you need only select the data source format to be used flat files or ODBC compatible data sources and specify the location of the data file The C Data Access Application Programming Interface is helpful to developers who want to write access code for proprietary format databases 3 2 2 1 1 The Infinitelnsight Access Feature The nfinitelnsight Access formerly known as KAA feature allows reading SAS data and writing the scores obtained with an nfinite nsight model into a SAS table The following formats are currently supported SAS files version 6 under windows amp unix SAS 7 8 under windows amp unix SAS Transport Files You can access directly a SAS table with the nfinite nsight interface simply by selecting the format of the file to analyze Once you have built your model with nfinite nsight you can generate a SAS table containing the model application results scores probability cluster number predicted value The nfinite nsight interface allows you to select the output format The generated SAS table is automatically integrated in SAS information system SAP
249. ng the Model ceeeeceeeseeeeeeeeeeeeeeeeeteesesesetesetesiesiesaesaeeas 217 6 2 3 Step 3 Analyzing and Understanding the Model Generated cccceeeeeeeeeeeeeeteeeeeeeeneeneeneenees 223 6 2 4 Step 4 Using the Model eccsecceseseeseeeeeeeeeeeeeeeseeeeeesessessesseseseseecesaesesesesesesesieseeaeeaees 252 7 GOSS ANY onic ecco coe are de cece casei cede sr Sre omp pN APSE POHE ce ao ranp Nepre aoo sentdeceeruvcccsdnas cude evade an RT sudecucues Ponos aroos Siar oan Eris 260 SAP Infinitelnsight 6 5 SP4 CUSTOMER How to Use this Document 2013 SAP AG or an SAP affiliate company All rights reserved III 1 How to Use this Document IN THIS CHAPTER Organization of this DOCUMENT eecceeeceeeeeeeeteeeeneeeeeeeeeaeeseaeeeeaeeceaeeseaeeseaeeseaeeceaeesnaeeseaeeseaeeseaeesnaeeseaeeseaeeseeeseaters 4 Which Sections should you Read 2 eeeeeeceeeeeeeeneeeeeeeseeeeceaeeseaeeseaeeseaeeseaeeenaeeseaeeseaeeseaeeenaeeseaeeseaeeseaeeseaeeseaeeseaeess 5 Conventions Used in this DOCUMENA eecceeeeeeeeeeeeneeeeeeeteaeeeeaeeseaeeeeaeeseaeeeaeeseaeeseaeeseaeeseaeeseaeessaeeeeieeseaeeseeeeeaters 6 1 1 Organization of this Document This document is subdivided into six chapters This chapter Welcome to this Guide serves as an introduction to the rest of the guide This is where you will find information pertaining to the reading of this guide and information that will allow you to contact us Chapter 2 nfinitelnsigh pro
250. notion of nearest cluster does not exist If a case belongs to a cluster distance is set to 0 If a case does not belong to a cluster distance is set to 1 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 255 e a EEE ss 6 2 4 1 3 4 4 Probabilities This option allows you to add to the output file the probabilities that the observation belongs to each cluster The probability for the observation to belong to the closest cluster is displayed in the column kc_best_proba_ lt TargetVariable gt this probability is usually the highest The probability for the observation to belong to the second closest cluster is displayed in the column kc_best_proba_ lt TargetVariable gt _2 and so on until the furthest cluster You can add all the probabilities or only the ones corresponding to the closest clusters MV To Add All Probabilities Check the All option MI To Add Only the Probabilities for the Closest Clusters 1 Check the Top option 2 Inthe text field enter the number of probabilities you want to add for example the two three or four best Note When the SQL mode is activated the notion of nearest cluster does not exist If acase belongs to a cluster probability is set to 1 If a case does not belong to a cluster probability is set to 0 6 2 4 1
251. nts The result of the regression is then transformed to align target segment means and score segment means in the post processing phase Note This is the default strategy used in Infinitelnsight The last strategy which applies to regressions using a post processing consists in using first an encoded target value instead of the original target value during the learning model phase in order to have a uniform distribution it is the pre processing phase Then regression coefficients are computed and scores are transformed in the original target space during the post processing phase Note This strategy is to be preferred when the default strategy does not produce models with enough quality which is often the case with very skewed target distributions V Regression Without Post processing Uncheck the option Enable Post processing Fl Enable post processing Note Example of Performance Curve Pesformance Pesformance Pesformance It is not possible to change the target encoding strategy when the post processing is disabled VI Regression with Original Target Values 1 Check the option Enable Post processing 2 Select the radio button Original target encoding Enable post processing Original target encoding Uniform target encoding SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 98 Note This
252. nuous target gt 4 0 3 Classification 2010 02 10 17 33 57 5 1 1 en SimpleModel 2010 02 12 11 20 33 5 1 1rc5 en SimpleModel 2005 09 08 17 04 16 en SimpleModel 2005 09 05 10 25 15 en Classification 2008 01 22 10 52 34 en SimpleModel 2004 12 16 17 33 31 en SimpleModel 2005 09 21 16 53 22 KOR ranae derici ven Claccifiratinn lies SAP Infinitelnsight 6 5 SP4 Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved MANANI 17 11 1453 Refresh 3 Delete Si irs Selected Ch masta In the Data Type list select one of the following options Text files Database SAS files SAS Transport depending upon the format of the model that you want to open CUSTOMER 187 3 Click the Browse button A selection dialog box will appear lt Data Source Selection x Select Source Folder for Data ve o gl C Users denise ortiz caso Documents Samples CJ Census CJ JapaneseData kar 0 kelData I KSN KT KT CO KxJavaCode re c re 3 re K xl Eee D Samples o User a Password a E SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All
253. o a word processing or spreadsheet program a text editor MI To Save the Model Summary Click the LJ Save button situated under the title The file is saved in HTML format VI To Print the Model Summary F 1 Clickthe P Print button situated under the title A dialog box will appear allowing you to select the printer to use 2 Select the printer to use and set other print properties if need be 3 Click OK The report will be printed SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 112 5 2 3 3 Model Graphs 5 2 3 3 1 Displaying the Model Graph MI To Display the Model Graph 1 Onthe screen Using the Model click the Model Graphs option The model graphs will appear When the target is nominal the following curve is displayed aly Model Graphs BA l ADUAS ProfitType Detected Hf Performance 5 a o D oS D D a oP gh aor err ar ae ar ase oor os oP er gor en ao ae or oer at Faua oo percentage E Random W Wizard Validation The default parameters will display the profit curves corresponding to the Validation sub set blue line the hypothetical perfect model Wizard green line and a random model Random red line The default setting for the type of profit parameter is Detected profit and the values of the abscissa are provided in the form of a percentage of the
254. obabilities of deviation of each variable distribution be it by variable variable category or group of categories A probability over 0 95 indicates that the variable or category global distribution is significantly different in the control data set than in the reference data set Note The probability of deviation is actually a standardized Chi test It is significant above 0 95 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 153 the second group comprised of the options Probability of Target Deviation and Probability of Target Deviation for Grouped Categories lists for each variable the probabilities of deviation of the categories and the grouped categories with respect to the target variable A probability over 0 95 indicates that there is a change of behavior with respect to the target variable in the category or group of categories the last group contains only the option Category with Problem For each data set reference data sets and control data set all variable categories with a probability over 0 95 are listed This allows you a quick visualization of possible problems without having to analyze all the reports Warning In all the report panels the control data set is referred as the Apply n data set a Control for Deviations on Apply in I Control for Deviations on Apply in Deviation Summary Secategory w
255. ocial models enabled only Warning When sharing or sending a model all these files must be joined to the model or the recipient will not be able to open the model SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 186 i a F 5 2 4 8 Opening an Existing Model Once saved models may be opened and reused in nfinite nsight VI To Open a Model On the main nfinite nsight screen select Load a Model The screen Opening a Model will appear X KXEN InfiniteInsight 1 Opening a Model Data Type Text Files x Folder c Users denise or tiz caso Documents Models x oa Browse gt KTC Method 2006 12 14 13 18 27 ksn_link_events_tes 2008 12 19 17 18 24 KSN_contactsWithD 2009 10 27 16 49 59 IKSN_contact_5 1 1 2009 10 27 15 57 24 KSC Scenario 3 ses 2004 06 02 17 33 53 KSC intermediate sequences KSC Scenario 3 S 2004 06 03 11 10 58 K2R Intermediate Seq FirstLast IK2S_Unsupervized_ 2005 09 20 16 49 46 2004 12 16 18 26 14 2005 09 05 16 44 56 K25 gt Census gt Cla 2007 03 20 16 42 05 IK2S gt Census 1500 2005 10 19 11 25 40 en Classification 2007 12 05 10 50 36 cen Regression 2007 12 06 18 23 40 iz gt User Guide gt Scenario v4 0 3 gt User Guide gt Census with conti
256. oding Strategy Target Mean z nTen vit You may calculate the cross statistics for the model to be generated define the target key value choose the distance computing option or choose the encoding strategy 6 2 1 6 1 1 Calculating the Cross Statistics This option allows you to visualize the profile of each explanatory variable for each cluster with respect to their profile for the entire data set MI To Enable the Cross Statistics Calculation Check the Calculate Cross Statistics box 6 2 1 6 1 2 Defining the Target Key Value The Set Target Keys value option lists the target variables selected in the Selecting Variables screen and allows you to choose their key value I To Define the Target Key Value In the Target Key field enter the key value of the target variable 6 2 1 6 1 5 Choosing the Distance Computing Method The Distance list allows you to specify the distance used to compare K2C encoded input data Vi To Choose the Distance Computing Method In the Distance drop down list select among these options Chessboard maximum of absolute differences between coordinates LInf Euclidean square root of sum of square differences between coordinates L2 City Block sum of absolute differences between coordinates L1 System Determined default value Lets the system determine the best distance to be used according to the model build settings SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segm
257. ographic domain and a behavioral domain 7 1 1 1 4 antecedent X is called the antecedent of the rule The antecedent can be composed of an item page 269 or an itemset page 269 for example X can be the set A B C SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 260 7 1 1 1 5 application data set An application data set page 265 is a data set to which you apply a model This data set contains an unknown target variable page 279 for which you want to know the value 7 1 1 1 6 association rule An association rule is an implication relation of the form X gt Y The rule means if the attribute X is present in a session then the attribute Y is present too Two measures allow qualifying the quality of the rule the Support page 278 and the Confidence page 263 7 1 1 1 7 attribute In computing an attribute is a specification that defines a property of an object element or file 7 1 1 1 8 AUC The AUC statistic is a rank based measure of model performance or predictive power calculated as the area under the Receiver Operating Characteristic curve see ROC on page 46 For a simple scoring model with a binary target this represents the observed probability of a signal responder observation having a higher score than a non signal non responder observation For individual variables ordering based on score is replaced by ordering based on the response prob
258. on Estimation data set with respect to the target It will be excluded from the model with respect to this target The variab be exclude e has a small KI on Validation data set with respect to the target It will d from the model with respect to this target A large KI difference has been observed for this variable between Estimation and Validation data sets with respect to the target It will be excluded from the model with respect to this target The variabl model with Some reports can be displayed as a pie chart Some reports can be displayed as a line chart e has a small KR with respect to the target It will be excluded from the respect to this target This option allows you to display the current report view in the graphical table that can be sorted by This option allows you to display the current report view as a HTML table Some reports can be displayed as a bar chart This bar chart can be sorted by ascending or descending values or by ascending or descending alphabetical order You can also select which data should be When the current report is displayed as a bar chart this option allows you to change the orientation of the bars from horizontal to vertical and vice versa CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 136 5 2 3 7 3 Usage Options S b amp L e fC This option allows you to copy the data from t
259. on Target s Variable s upper right hand side The variable moves to the screen section Target s Variable s Also select a variable in the screen section Target s Variable s and click the button lt to move the variables back to the screen section Explanatory variables selected SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 210 SSS SSN SS ee 6 2 1 5 2 Weight Variable B For this Scenario Do notselect a weight variable VI To Select a Weight Variable 1 Onthe screen Selecting Variables in the section Explanatory variables selected left hand side select the variables you want to use as a Weight Variable KXEN InfiniteInsight New Regression Classification Model Selecting Variables Explanatory Variables Selected Target Variables gt ass lt T Alphabetic Sort Weight Variable Exduded Variables gt Index lt Number of Variables 14 q3 ey 7 Alphabetic Sort H I Alphabetic Sort Gn un a Gn Note On the screen Selecting Variables variables are presented in the same order as that in which they appear in the table of data To sort them alphabetically select the option Alphabetic sort presented beneath each of the variables list 2 Click the button gt located on the left of the screen section Weight Variable middle right hand side The variable moves to the screen section Weig
260. on and the percentage of positive target SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 148 5 2 3 10 3Customizing the Display The button Display Settings allows you to customize some of the display settings for the decision tree Orientation this setting allows you to select if you want to display the tree horizontally or vertically Horizontal Vertical Display Type this setting allows you to display the decision tree as a standard decision tree Decision Tree Like or with a specific Infinitelnsight look Infinitelnsight Display The option Decision Tree Like is more compact but the nfinitelnsight Display is more easily read Infinitelnsight Display Decision Tree Like When you have set the display parameters click the Close button SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 149 5 2 4 Step 4 Using the Model Once generated a classification model may be saved for later use A classification model may be applied to additional data sets The model thus allows you to perform predictions on these application data sets by predicting the values of a target variable The model can also be used to carry out simulations on specific observations on a case by case basis Moreover y
261. on table In the Available list select the variables you want to add use the Ctrl key to select more than one variable Click the gt button to add the selected variables to the Selected list S ON R 5 2 4 2 5 1 4 User Defined Constant Outputs This option allows you to add to the output file constants such as the apply date the data set name or any other information useful for using the output file A user defined constant is made of the following information Parameter Description Value Warnings Visibility indicates if the constant will appear in the checked the constant appears in the output output or not unchecked the constant does not appear in the output Name the name of the user defined constant 1 The name cannot be the same as the name of an existing variable of the reference data set 2 Ifthe name is the same as an already existing user defined constant the new constant will replace the previous one Storage the constant type number string integer number date string integer date datetime Value the value of the constant date format Yy YY MM DD datetime format YYYY MM DD HH MM SS Key indicates if the constant is a key variable or the variable is not an identifier identifier for the record You can declare A f Y multiple keys They will be built according to 1 primary identifier the indicated order 1 2 3 2 secondary identifier I To Define a Constant Click the Add button A p
262. oncerned with data manipulation and other fundamental operations independently of how these are eventually represented to the user SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 266 7 1 1 1 41 entity An entity is the object of interest of any analytical task it can be a customer a product a store and is usually identified with a single identifier that can be used throughout the data repositories Entities are usually associated with a state model describing the life cycle of such an analytical object of interest Note This is a technical constraint entities MUST be uniquely identified 7 1 1 1 42 error bar see prediction range 7 1 1 1 43 error mean mean of the difference between predictions and actual values 7 1 1 1 44 error standard deviation dispersion of errors around the actual result 7 1 1 1 45 event data set An event data set should consist of at least an event date such as birthdate or beginning of trial in YYYY MM DD format areference id i e customer id that will be used to join the Events or transactions data with the reference or static customer table previously defined 7 1 1 1 46 excluded variable actual target SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 267 7 1 1 1 47 explanatory variable An explanatory variable is a variable that describes your dat
263. ontained in the first file To avoid this type of bias do not forget to mix up your data prior to analysis SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 20 4 4 3 2 Seven Automatic Cutting Strategies 4 4 3 2 1 Background With the exception of the customized cutting strategy cutting strategies are automatic Automatic cutting strategies operate upon a single data file which constitutes your initial data set Automatic cutting strategies always cut the initial data set into the same proportions The following table details the proportions attributed to each data set depending on the presence of a test data set Automatic Cutting Strategies with Test Automatic Cutting Strategies without Test 3 5 of the data are used in the estimation sub set 3 4 of the data are used in the estimation sub set 1 5 of the data are used in the validation sub set 1 4 of the data are used in the validation sub set 1 5 of the data are used in the test sub set 1 Estimation 60 1 Estimation 2 Validation 75 25 4 4 3 2 2 Random The Random cutting strategy distributes the data of the initial data set in a random manner between the three sub sets estimation validation and test SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 21 4 4 3 2 3 Random with Test at the End The Random wit
264. op up window opens allowing you to set the constant parameters In the field Output Name enter the constant name In the list Output Storage select the constant type In the field Output Value enter the constant value a fF wonD Click the OK button to create the constant The new constant appears in the list You can choose whether to generate the defined constants or not by checking the Visibility box SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 161 ir U T T e a F 5 2 4 2 5 2 Gain Chart This tab allows you to compute the gain chart on the apply data set that is to rank your data in order of descending scores and split it into exact quantiles decile vingtile percentile MV To Compute the Gain Chart 1 Check the box Compute Gain Chart on Apply in Data 2 Inthe list select the Number of Quantiles you want your data to be segmented in 3 You can add additional variables in order to estimate profits per segments of the population 1 Inthe Variables list select the variables you want to add to the gain chart Use the CTRL key to select multiple variables 2 Click the gt button to add the selected variables to the list Values for Gain Chart The sum of each selected variable will be calculated for each segment of the population a A Click Validate to save the advanced parameters and go back to the panel Ap
265. or an SAP affiliate company All rights reserved 276 7 1 1 1 98 seasonal Variations due to calendar events 7 1 1 1 99 sensitivity Sensitivity which appears on the Y axis is the proportion of correctly identified signals true positives found out of all true positives on the validation data set 7 1 1 1 100 sequential cutting strategy The sequential strategy cuts the initial data set into three blocks corresponding to the usual cutting proportions The lines corresponding to the first 3 5 of the initial data set are distributed as a block to the estimation data set The lines corresponding to the next 1 5 of the initial data set are distributed as a block to the validation data set The lines corresponding to the final 1 5 of the initial data set are distributed as a block to the test data set 7 1 1 1 101 session A session is identified by a unique key and is composed by one or more transactions 7 1 1 1 102 simulation Application of a model to only one record 7 1 1 1 103 smart variable contributions The variable contribution in a model while taking into account the variable correlation SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 277 7 1 1 1 104 social network analysis Social network analysis is used to approach problems such as community identification diffusion in graphs product adoption epidemiology graph evolution or influence
266. ork Analysis Model Perform a Data Transfer List Distinct Values in a Data Set Get Descriptive Statistics for a Data Set 2 Click the feature you want to use SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 195 6 1 7 1 Editing the Options M To Edit the Options of InfiniteInsight 1 Click the button Help in nfinite nsight 2 Click the button in the help panel The following options can be modified Category Options General Country Language Message Level Log Maximum Size Message Level for Strange Values Display the Parameter Tree Number of Store in the History Always Exit without Prompt Include Test in Default Cutting Strategy Stores Default Store for Apply in Data Set Default Store for Apply out Data Set Default Store to Save Models Metadata Repository Enable Single Metadata Repository Edit Variable Pool Content Graphic Profit Curve Points Bar Count Displayed No KXEN Look and Feel Display 3D Chart Disable Double Buffering Optimize for Remote Display Remember Size and Position when Leaving Report Number of Variables of Interest Active Style Sheet Customize Style Sheets SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 196 6 1
267. ort Generate Source Code Save Model Gra umn SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 109 TO The screen Using the Model presents the various options for using a model that allow you to Display the information relating to the model just generated or opened Display section referring to the model curve plots contributions by variables the various variables themselves HTML statistical reports table debriefing Some information is only displayed upon request from the user the display of nfinitelnsight Modeler Regression Classification results as a decision tree which can be specified in the modeling parameters before the model generation or the display of model parameters which can be requested in the general user options Apply the model just generated or opened to new data to run simulations and to refine the model by performing automatic selection of the explanatory variables to be taken into consideration Run section Save the model or generate the source code Save Export section 5 2 3 2 Model Overview The Model Overview screen displays the same information as the training summary 5 2 3 2 1 Overview Name Name of the model created by default from the target variable name and the data set name Data Set Name of the data set Initial Number Variables Number of explanatory variables used
268. ory Significante nierien anes erea EEEa e Taa E E aane ENEE aa e e TASEEN 231 Cl sters SUMMATY isini hace face Sat ka Saskeda Sapte eaan eaa e ta Deven aa Eea Sra A AAE NE AA EE NEETER N EN AARE NE ated 237 Glusters Profiles E E E 242 Statistical REPONMS csc nnr E n E E EEEN E E E E OEN E E 250 A suite of plotting tools allows you to analyze and understand the model generated The performance of the model with respect to a hypothetical perfect model and a random type of model The characteristics of each of the clusters The significance of the various categories of each variable of a cluster with respect to the target variable cross statistics 6 2 3 1 User Menu Once the model has been generated click the Next button The screen Using the Model will appear lt KXEN InfiniteInsight class_Census01 Using the Model Wi Display Model Overview Model Graphs Category Significance Clusters Summary Cluster Profiles Statistical Reports Run Analyze Deviations Apply Model Simulation Save Export Export KxShell Script Generate Source Code Save Model Cancel Previous SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 223 SE uaj The screen Using the Model presents the various options for using the model that allow you to Display the information re
269. ou can refine a classification model by re generating it with an optimized list of explanatory variables The nfinite nsight allows you to select the variables most pertinent to your business issue automatically with pertinence defined as producing the minimum area between the predictive curve and the hypothetical perfect curve and thus maximizing the volume of information explained by the model So that you to apply the model to any other database nfinite nsight allows you to generate different source code of the model C XML AWK HTML SQL PMML2 SAS JAVA 5 2 4 1 Analyzing Deviations The option Analyze Deviations is a tool that provides you with a diagnostic of the data statistical variation This option can be used for several purposes to compare the distribution of a new data set with the distribution of the data set used to train the model to check the quality of new data after loading them to check if your data have evolved over time and thus if the model need to be adapted to the new data SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 150 5 2 4 1 1 Selecting the Data Set to Analyze First you need to select the data set for which you want to analyze the deviations For the results to make sense the new data set should contain the same columns as the data set that was originally used to train the
270. output file the probability for one or more target variable categories that is for each observation the probability of the target variable value to be the selected category It appears in the output file as proba_rr_ lt Target Variable gt for the target variable key category and as proba_rr_ lt Traget Variable gt _ lt Category gt for the other categories of the target variable MVI To Add the Probabilities of All Target Variable Categories Check the All option MVI To Add Only the Probabilities of Selected Categories 1 Check the Individual option 2 Inthe Selection column check the boxes corresponding to the categories for which you want to add the probabilities in the output file 5 2 4 2 5 3 3 3 Miscellaneous Outputs 24 2 5 3 3 31 Outlier Indicator This option allows you to show in the output file which observations are outliers An observation is considered an outlier if the difference between its predicted value and its real value exceeds the value of the error bar In other words the error bar is a deviation measure of the values around the predicted score It appears in the output file as outlier_rr_ lt target variable gt Possible values are 1 if the observation is an outlier with respect to the current target else O SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 166 SST 24 2 5 3 3 3 2 Predicted Value Quant
271. oves to the screen section Weight Variable Also select a variable in the screen section Weight Variable and click the button lt to move the variables back to the screen section Explanatory variables selected SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 91 5 2 1 5 3 Explanatory Variables By default and with the exception of key variables such as Kx ndex all variables contained in your data set are taken into consideration for generation of the model You may exclude some of these variables For the first analysis of your data set we recommend that you retain all variables It is particularly important to retain even the variables that seem to have no impact on the target variable If indeed these variables have no impact on the target variable the model will confirm this In the contrary case the model will allow you to recognize previously unidentified correlations between these variables and the target variable By excluding variables from the analysis based on simple intuition you take the risk of depriving yourself of one of the greatest value added features of nfinite nsight models the discovery of non intuitive information Depending on the results obtained from the first analysis which included all of the variables of the data set you can generate a second model by excluding the variables too closely correlated wi
272. p A group containing the selected categories is created in the list Group Structure x Group Structure Category Edition 7th 8th Sth KxOther p Assoc acdm Assoc voc Assoc acdm Assoc voc p Bachelors Bachelors p Doctorate Prof school Doctorate Prof school Add New Group Preschool ist 4th Remove Group JV Enable the target based optimal grouping performed by K2C New Category Add Missing Mi To Include Missing Values in a Group 3 Inthe list Group Structure select the group in which you want to add the missing values Click the button Add Missing located under the list Category Edition The KXMISSING category which represents the missing values is added to the selected group and the button Add Missing is deactivated As any category the KXMISSING category can only belong to one group at a time CII j x Group Structure Category Edition th KxOther B Assoc acdm Assoc voc Assoc acdm Bachelors B Doctorate Prof school Doctorate Prof school ry HS grad HS grad ry Masters Some college 0 1st 4th Preschool KxMissing ist 4th Preschool Add New Group Add Category New Category Preschool ok cane SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 81 m Ee a e _ l O
273. parameter tree under the section UniformCurvePoints 2 Sort the normalized target values and graph the cumulative sums to obtain the Wizard graph Note 20 bins are used to give a good approximation while using less computation 3 Re sort by estimated values and again graph the corresponding actual values cumulative distributions validation graph 4 As usual KI is the ratio of the validation and wizard areas The KI metrics is thus based on the order of the estimates and compares this order with the order of the actual continuous targets As such it is more robust than the L1 such as Mean absolute Error MAE or L2 metrics such as Mean square error MSE or Root Means Square Error RMSE or Pearson coefficient often used for regression since one very large erroneous target will never decrease the overall KI figure and could be a major cause for instability of all the other metrics On the other hand the KI metrics does not take into account the estimated values with respect to the target values in other words a model with estimates in the range of 2 2 could have a very good KI even if the actual targets are in the range 0 100 provided that the model has found the correct order between estimates and actual target values KXEN technology limits this effect by providing piece wise linear recalibration of the estimates to the actual targets based on the statistics on the Validation data set thus providing not only good order est
274. plying the Model VI Results The result of the gain chart computation is available at the end of the model application It can also be found in the Statistical Reports in the section Model Performance 5 2 4 2 5 2 1 Results The result of the gain chart computation is available at the end of the model application Bd KXEN Infinitelnsight Class_Census_GainChart Erim w Control on Apply in ve wpa Aasaa Summary a K2R Engine Kxen RobustRegression Target dass Data Set Learn bed seve the Reports 5 shes It can also be found in the Statistical Reports in the section Model Performance SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 162 5 2 4 2 5 3 1 Reason Codes This feature allows retrieving a list of the variables whose values influence the most a score based decision typically a risk score An example of use of the reason codes is to provide a customer with the reasons why the automatic scoring system did not approve their loan MI To Generate Reason Codes 1 Inthe tree Advanced Apply Settings located on the left of the panel open the node Outputs for Target lt Target Name gt 2 Select Reason Codes 3 Click the button located on the right of the displayed table Click in the cell corresponding to t
275. ports as graphs if such a view exists Textual saves the reports as tables Click the OK button Select the folder in which you want to save the report Enter the name of the file ou A Warning When selecting the options Automatic or Graphical be careful to choose an appropriate file type such as pdf rtf or HTML SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 154 5 2 4 2 Applying the Model to a New Data Set The currently open model may be applied to additional data sets The model allows you to perform predictions using the application data sets and specifically to predict the values of the target variable 5 2 4 2 1 Constraints of Model Use In order to apply a model to a data set the format of the application data set must be identical to that of the training data set used to generate the model The same target variable in particular must be included in both data sets even if the values are not contained in the application data set Note If the Kx ndex variable of the model is virtual the application data set must not contain a physical KxIndex variable SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 155 E For this Scenario Due to technical constraints a data set corresponding to the database of
276. r_rr_Class The individual contributions by variables contained in the data set with respect to the target variable The names of the columns of individual contributions correspond to the names of each of the variables prefixed by contrib_ or in this case contrib_age contrib_workclass and so on SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 171 i eS T 5 2 4 3 Performing a Simulation The open model may be used to carry out simulations on specific observations one at a time To define the observation to be analyzed the variables of your choice must be associated with values For instance if you have selected the occupation profession category and workclass socio professional category variables they must contain values During execution of the simulation nfinite nsighf will automatically assign values to certain variables when values are missing but essential to proper completion of the simulation Once the simulation is complete you will obtain the following results The predicted value score The probability that this observation belongs to the target category of the target variable VI To Simulate a Model 1 Onthe screen Using the Model click the option Simulation The screen Simulating the Model will appear KXEN InfiniteInsight class_Census01 ao Simulating the Model Explanatory Variables
277. rametric setting in which the category importance is defined as Freq Xi Z CategoryImportance Xi normalProfit Xi where normalProfit Xi is the normal profit of category Xi see below for a definition Freq Xi is the global frequency of the category Xi Zis anormalization constant We give below the details of the computation of these quantities 5 2 3 5 5 2 1 Normal Profit Each category of the target S is associated with a profit profit Sj defined such that j 8 gt profit s Frea sj 0 j 1 The profit of a target category is a value in the range 1 It is defined the following way from the cumulated target category frequencies j 1 profit s 29 Freq S Freq 5 1 k 1 The normal profit of a category Xi is then defined as J B normalProfit X gt profit S Prob S XJ 7 1 Where Proba Sj Xi is the conditional probability of observing the target category Sj in the variable category Xi cross Statistics Freq 5 Xi Prob S X mali A 5 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 130 The fact that these formulas rely only on frequencies makes them resistant to any monotonic transformation of the target S 5 2 3 5 5 2 2 Normalization Constant The normalization can be approximated for non pathological continuous targets that is continuous targets w
278. regression strategy corresponds to regressions used in KXEN from version 3 3 1 to version 3 3 6 This strategy is set by default M Regression with Uniform Target Encoding 3 Check the option Enable Post processing 4 Select the radio button Uniform target encoding Enable post processing As 5 2 1 7 1 7 Defining the Target Key Values For the binary targets you have the option to select which value is the key category for each target By default the category selected by nfinitelnsigh is the least represented in the data set The Advanced Model Parameters screen lists all the binary targets of the current model allowing you to define the key category for each target that is the expected value of the target E In this scenario Do not define a value for the target variable nfinite nsight will automatically select 1 as the key category for the Class variable VI To Define the Key Category Value for a Target Variable In the Target Key field corresponding to the chosen target enter the key value r Target Key Settings Target Target Key dass 1 Extract Variable Categories 5 2 1 7 2 Auto selection Tab The Auto selection tab allows you to define the parameters of the automatic variable selection lt KXEN InfiniteInsight V6 5 1 a3 class_Census01 Advanced Model Parameters General Auto selection Learning Mode Gain Chart 7 Enable Auto selection Select the best model keeping between 1 and alll
279. respond to the relationship of that category to the target variable and whether that category has more or less observations belonging to the target category of the target variable For a given category a positive bar on the right of 0 0 indicates that the category contains more observations belonging to the target category of the target variable than the mean calculated on the entire data set A negative bar on the left of 0 0 indicates that the category contains a lower concentration of target category of the target variable than the mean Note You can display the profit curve for the selected variable by clicking the button wa Display Profit Curve located in the tool bar under the title The importance of a category depends on both its difference to the target category mean and the number of represented cases High importance can result from a high discrepancy between the category and the mean of the target category of the target variable or a minor discrepancy combined with a large number of records in the category or a combination of both The width of the bar shows the profit from that category The positive bars correspond to categories which have more than the mean number from the target category that is responders and the negative bars correspond to categories which have less than the mean number from the target category that is responders The Variables pull down menu allows the selection and graphing of any of the v
280. riable KXEN InfiniteInsight class_Census01 ol xi aly Score Card class age Category 0 01958 0 06114 0 0003299 age 0 05553 1 33 age 31 98 0 005284 age 0 181 0 008259 age 0 2573 0 00971 age 0 2965 0 003521 age 0 1222 0 004203 age 0 1427 0 0051 9 age 0 1739 31 33 Note in the case of a continuous variable the Score Card always includes a number of categories that is higher than in the user defined structure or as given by the parameter band count if no user structure has been set Indeed the encoding of variables for the Score Card adds target curve points to increase the accuracy of coding according to the training data set These points split some existing categories and thus increase the number of categories in the Score Card SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 138 5 2 3 8 1 Risk Mode The representation of a model equation is easier to read and to interpret in the Risk Mode due to stepwise encoding for ordinal and continuous variables In the Risk Mode it is easy to define which category has a negative or positive effect on the risk score and consequently on the odds or on the probability of risk In order to illustrate the advantages of a scorecard in interpreting results the variable age will be used for this example The segment 24 27 has
281. riable being the most important 1 reason code with respect to the target variable CLASS Among the variables whose contribution is below BELOW the mean MEAN of the population contribution the selected variable will be the one having the highest deviation with it REASON _VALUE_ lt CRITERION gt _ lt THRESHOLD gt _ lt RANK gt _RR_ lt TARGET NAME gt contains the value of the reason code that is the difference between the variable contribution for the customer and the threshold SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 163 5 2 4 2 5 3 2 Continuous Target Option Output Column Name Predicted Value rr_ lt target variable gt Confidence bar_rr_ lt target variable gt Outlier outlier_rr_ lt target variable gt Indicator Contributions contrib_ lt variable gt _rr_ lt target variable gt For example if marital status is an explanatory variable for the target variable class the column contrib_marital status_rr_class will be generated in the output file VI To Add All Variables Contributions Check the All option MV To Add Specific Variable Contributions This option allows you to generate in the output file the value predicted by the model for the target variable This option is checked by default add to the output file the conf
282. riables are continuous Option Natural Min Max Standard Deviation Normalization SAP Infinitelnsight 6 5 SP4 Description This option does not transform the input data This option encodes the categories of the variable in the range 0 1 where 0 corresponds to the minimum value of the variable and 1 corresponds to the maximum value This option performs a normalization based on the variable mean and standard deviation x Mean StdDev CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 215 6 2 1 6 2 Activating the Autosave Option The panel Model Autosave allows you to activate the option that will automatically save the model at the end of the generation process and to set the parameters needed when saving the model M To Activate the Autosave Option 1 In the panel Summary of Modeling Parameters click the Autosave button Autosave Export KxShell Script Advanced The panel Model Autosave is displayed 2 Check the option Enable Model Autosave lt KXEN InfiniteInsight Cash_CashFlows iol xi Model Autosave IV Enable Model Autosave Description Saved Model Ps Data Type TextFiles Folder Samples x i File Table MyModel txt l Browse Gn a 3 Set the parameters listed in the following table 1 Parameter Description Model This field allows you to associate a name
283. rofiles of the individuals in your database as a function of their propensity to purchase Then applying the segmentation model obtained from the training data to the entire list of prospects to determine which cluster each individual should belong SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 191 6 1 5 Your Solutions To select the individuals to whom you should send a mailing there are several solutions you can use An intuitive method A classical statistical method K means both ascending and descending hierarchical segmentation models The KXEN method 6 1 5 1 Intuitive Method This method consists of using your knowledge of the various profiles exhibited by your customers Thanks to the domain specific knowledge that you have of your customers you determine the criteria of the segmentation model intuitively and build the clusters yourself The main disadvantage of this method is that the number of information items available for each customer will invariably grow with time The more data your database accumulates the harder it is for you to manually create clusters that take all data into consideration and to develop a response to your business issue Furthermore as the increasing volume of information requires you to build segmentation models with increasing frequency the time required to build these segmenta
284. rs obtained This information could be useful to you later for identifying your model Note This description will be used instead of the one entered in the panel Summary of Modeling Parameters Data Type this list allows you to select the type of storage in which you want to save your model The following options are available Text files to save the model in a text file Database to save the model in a database Flat Memory to save the model in the active memory SAS Files to save the model in a SAS compatible file for a specified version of SAS and a specified platform SAS v6 or 7 8 for Windows or UNIX SAS Transport to save the model in a generic SAS compatible file Folder Depending upon which option you selected this field allows you to specify the ODBC source the memory store or the folder in which you want to save the model File Table This field allows you to enter the name of the file or table that is to contain the model The name of the file must contain one of the following format extensions txt text file in which the data is separated by tabs or csv text file in which the data is separated by commas Click the OK button SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 94 5 2 1 7 Setting the Advanced Parameters On the screen Summary of Modeling Parameters click the Advanced button The
285. rtunately you have neither the time nor the necessary resources to Perform a survey to fill in the missing information Re format the database 5 1 3 3 Technical Environment The database available to you is stored in an RDBMS relational database management system residing on a UNIX server maintained by the Information Technology department of the bank The technical constraints of this information environment are determining factors in selecting potential data analysis tools 5 1 4 Your Approach By virtue of the critical stakes involved in this campaign because of your limited budget and your inability to predict customers enthusiasm for the new product you have chosen to minimize your risks by dividing the project into two steps 1 Test the marketing campaign on a sample of 50 000 individuals extracted from the prospects database of 1 000 000 people 2 Global launch of the marketing campaign using the entire contents of the prospects database 5 1 4 1 The Test Phase of Your Marketing Campaign The test phase of your marketing campaign allowed you to collect a sample of 50 000 individuals whose behavior with respect to this new product is known 25 of the prospects showed themselves to be clearly interested They chose to accept an invitation for a meeting with one of your sales channel agents 75 of the prospects declined your invitation Your business issue consists of understanding the test results by identif
286. ry ASdasms fe Relative Target Means Relative Target Means Data Set Estimation 7 0 6491125226020813 H A 6b 3 ka Clusters m Relative Target Means Among the ten clusters Cluster 7 is the cluster that has the highest proportion of observations belonging to the target category of the target variable Compared to the entire data set Cluster 7 contains 64 9 more customers belonging to the target category 1 of the target variable Class Cluster 9 contains less than 8 of customers belonging to the target category In other words Cluster 9 has almost the same customer density belonging to the target category as the data set taken as a whole Cluster 4 is the cluster with the lowest density of observations belonging to the target category Compared to the entire data set Cluster 4 contains 23 4 fewer of customers belonging to the target category This cluster therefore has a density of customers belonging to the target category lower than that of the data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 241 6 2 3 6 Clusters Profiles 6 2 3 6 1 Cross Statistics and Variables Profiles The clusters profiles allow you to view for each cluster the profile of each explanatory variable with respect to their profile over the entire data set the SQL expression of the cluster when they have
287. s or csv text file in which the data is separated by commas SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 185 5 2 4 7 1 Files Created when Saving a Model When you save a model nfinite nsight creates a series of files in the specified store The following table lists the files or tables created when saving a model and in which case Filename Description Used by KxAdmin lists all the models contained in the folder database with additional all models created with KXEN information date version name of the model comments lt Mode1_name gt file named after the model and containing all the model data except all models created with KXEN graphs information Graphs are stored in additional tables see below KxInfos indicates which additional tables are needed by the model all models created with KXEN KxOlapCube stores the OLAP Cube used by the decision tree when the option Infinitelnsight Modeler Infinitelnsight Modeler Regression Classification as Decision is Regression Classification activated models with decision tree KxLinks contains the links from the graphs of the model Infinitelnsight Social models only KxNodes lists all the nodes from all the graphs and their attributes Infinitelnsight Social models only KxCommunities matches the nodes to their communities if the community detection was nfinitelnsight S
288. s the data set to be analyzed must be presented in the form of a single table of data on page 24 except in instances where you are using the Event Logging or Sequence Coding features of Infinitelnsight Explorer To use nfinite nsight features you must have a training data set available that contains the target variable with all its values defined Then you can apply the model generated using the training data set to one or more application data sets The training data set is cut into three data sub sets for estimation validation and testing using a cutting strategy on page 19 The different types of variables on page 26 continuous ordinal and nominal are next encoded by the Data Encoding feature of nfinite nsight Modeler or by the Event Logging and Sequence Coding features in the case of dynamic data Before generating the model you must Describe the data A utility integrated with nfinite nsight allows you to generate a description of the data set to be analyzed automatically You need only validate that description verifying that the type and storage format of each variable were identified correctly Define the role of variables contained in the data set to be analyzed You may select one or more variables as target variables These are the variables that corresponds with your business issue The other variables of the table of data are considered to be explanatory variables they allow calculation of the value of the tar
289. s which allows you to check the deviations for each variable and each variable category between the validation and test data sets the Expert Debriefing in which you will find more specialized performance indicators as well as the variables encoding the excluded variables during model generation and the reason for exclusion and so on SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 250 6 2 3 7 1 Statistical Reports Options A tool bar is provided allowing you to modify how the current report is displayed to copy the report to print it to save it or to export it to Excel O O Ee EB S b b e This option allows you to display the current report view in the graphical table that can be sorted by column This option allows you to display the current report view as a HTML table Some reports can be displayed as a bar chart This bar chart can be sorted by ascending or descending values or by ascending or descending alphabetical order You can also select which data should be displayed Some reports can be displayed as a pie chart Some reports can be displayed as a line chart When the current report is displayed as a bar chart this option allows you to change the orientation of the bars from horizontal to vertical and vice versa This option allows you to copy the data from the current view of
290. s A and B to market Modeler breaking it down into Infinitelnsight Modeler Segmentation Clustering allows you to Segmentation Cluster homogeneous data Regroup your customers into several homogeneous groups ing groups or clusters Understand the behavior of each of these groups with respect to products A and B 2 2 Before Beginning 2 2 1Files and Documentation Provided with this Guide 2 2 11 Sample Data Files The evaluation version and the registered version of nfinite nsighf are supplied with sample data files These files allow you to take your first steps using various features of nfinite nsight and evaluate them During installation of nfinite nsighf the sample files are registered in the folder c PROGRAM FILES KXEN INFINITEINSIGHTVX Y 2Z SAMPLES CENSUS The following table describes those files File Name Description When is it Used CensusOl csv Data file This file is used for both application scenarios used in this manual desc_censusOl cs Description file for the This file is used for both application scenarios used in this manual v CensusOl csv file To obtain a detailed description of the CENSUS01 CSvV file see Introduction to Sample Files page 59 SAP Infinitelnsight 6 5 SP4 CUSTOMER Welcome to this Guide 2013 SAP AG or an SAP affiliate company All rights reserved 8 2 2 1 2 Documentation 2 2 1 2 1 Full Documentation Complete documentation is included wi
291. s a system that performs mathematical operations on a sampling discrete time signal to reduce or enhance certain aspects of that signal O 7 1 1 1 77 ordinal variable Ordinal variables are variables with discrete values that is they belong to categories and they are sortable Ordinal variables may be numerical meaning that its values are numbers They are therefore ordered according to the natural number system 0 1 2 and so on textual meaning that its values are character strings They are therefore ordered according to alphabetic conventions 7 1 1 1 78 outlier A data value that does not come from the typical population of data in other words extreme values In a normal distribution outliers are typically at least 3 standard deviations from the mean P 7 1 1 1 79 performance indicator PI Performance indicators help organizations achieve organizational goals through the definition and measurement of progress The purpose of defining Pls is to have a common definition of a metric across multiple projects A metric like customer value could easily be defined in several different ways leading to confusing or contradictory results from one analysis to the next Shared Pls ensure consistency across analysts and projects over time The key indicators are agreed upon by an organization and are indicators which can be measured and will reflect success factors The Pls selected must reflect the organization s goals they
292. s the web address variable into a binary variable with two possible values KxOther the firm has a Web site and KxMissing the firm does not have a Web site 5 2 3 6 Statistical Reports Options A tool bar is provided allowing you to modify how the current report is displayed to copy the report to print it to save it or to export it to Excel 5 2 3 7 Statistical Reports The Statistical Reports provide you with a set of tables that allows you a more detailed debriefing of your model These reports are grouped in different levels of debriefing the Descriptive Statistics which provides the statistics on the variables their categories and the data sets as well as the variables cross statistics with the target Notes If your data set contains date or datetime variables automatically generated variables will appear in the statistical reports For more information refer to section Date and Datetime Variables Automatically Generated Variables on page 30 In the section Cross Statistics with the Target s the number of displayed categories corresponds to the number of categories as defined in the user structure the band count if no user structure has been defined For more information see the section Band Count for Continuous Variables In the section Grouped Cross Statistics with the Target s if the option Enable InfiniteInsight Modeler Data Encoding Optimal Grouping for All Variables is enabled the number of
293. se Using the model generated you will be able to determine How many individuals contained in your prospects database you should send your mailing to in order to maximize the profit return on investment of your campaign How to classify all of the individuals in your prospects database according to their interest purchasing probability in this new product This interest is expressed as a score or probability that a prospect will respond favorably to the campaign What characterizes these individuals and what is their profile Validate the criteria age socio occupational class degree that explain why a person expresses interest or not in the new financial product How to simulate in real time the likelihood of a single individual to respond favorably to a new offer in particular to allow the Call Center of your bank or a customer service agent to immediately know the level of interest that a prospect is likely to exhibit in this financial product 7 How to record this Score in your prospects database in order to be able to select sub groups of the population for new campaigns at a later date How to measure the quality and reliability capacity of handling new individuals of your model In order to allow you to better respond to these issues you have access to several possible application solutions 5 1 6 Your Solutions To select the individuals to whom you will send a mailing you have several possible
294. sed Description 7 Add Filter in Data Set Gn ex dios SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 72 i e a oS 5 2 1 2 6 Defining a Variable Structure There are three ways to define a variable structure by first extracting the categories from the variable statistics then editing or validating the suggested structure by importing the structure from an existing model by building a new structure from scratch The option Enable the target based optimal grouping performed by Infinitelnsight Modeler Data Encoding allows you to let Data Encoding group together the categories groups defined in the variable structure if they bring the same information The following table lists the possible states of a variable structure Icon State Description undefined Data Encoding will automatically determine the categories grouping depending on their interaction with the target variable g non editable The structure for an ordinal string variable cannot be modified S defined by extraction from the variable statistics The user must open and validate it i defined by the user or imported from an existing g model Note A translation of the variable categories has no influence on the variable structure which has to be set according to the original values of the variable V1 To Extract a Var
295. sential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 25 4 6 2 Example Ina database containing information about your customers the name and address of those customers are examples of variables 4 6 3 Types of Variables There are three types of variables Continuous variables Ordinal variables Nominal variables 4 6 3 1 Continuous Variables 4 6 3 1 1 Definition Continuous variables are variables whose values are numerical continuous and sortable Arithmetic operations may be performed on these values such as determination of their sum or their mean 4 6 3 1 2 Example The variable salary is a numerical variable but in addition is also a continuous variable It may for instance take on the following values 1 050 1 700 or 1 750 The mean of these values may be calculated 4 6 3 1 3 Continuous Variables and Modeling During modeling a continuous variable may be grouped into significant discrete bins SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 26 4 6 3 2 Ordinal Variables 4 6 3 2 1 Definition Ordinal variables are variables with discrete values that is they belong to categories and they are sortable Ordinal variables may be Numerical meaning that its values are numbers They are therefore ordered according to the natural number system O 1 2 and so on Textual me
296. served every time The default setting for the type of curve parameter is Predicted vs Actual The extreme values for prediction ranges are TARGETMEAN SQRT TARGETVARIANCE TARGETMEAN SQRT TARGETVARIANCE Note that sqrt Target Variance is equal to the Standard Deviation SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 120 Ere SS as 5 2 3 3 5 KI KR and Model Curves On the model curve plot Of the estimation data set default plot the KI indicator corresponds to the area found between the curve of the model generated and that of the random model divided by the area found between the curve of the perfect model and that of the random model As the curve of the generated model approaches the curve of the perfect model the value of KI approaches 1 Of the estimation validation and test data sets Select the corresponding option from the list Data set located below the plot the KR indicator corresponds to one minus the area found between the curve of the estimation data set and that of the validation data set divided by the area found between the curve of the perfect model and that of the random model 5 2 3 4 Contributions by Variables 5 2 3 4 1 Definition The Contributions by Variables plot allows you to examine the relative significance of each of the variables withi
297. specific business issue expressed in the form of a target variable possibly a weight variable and the explanatory variables Note For more information on variable roles see section Role of Variables see Roles of Variables on page 30 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 209 6 2 1 5 1 Target Variables E For this Scenario Select the variable Class as your target variable that is the variable that indicates the probability of an individual responding in a positive or negative manner to your campaign M To Select Targets Variables 1 Onthe screen Selecting Variables in the section Explanatory variables selected left hand side select the variables you want to use as Target Variables lt KXEN InfiniteInsight New Regression Classification Model Selecting Variables Explanatory Variables Selected Target Variables ass Weight Variable gt Exduded Variables gt dex lt I Alphabetic Sort Number of Variables 14 I A 7 Alphabetic Sort H OT Alphabetic Sort Gr un Ga Note On the screen Selecting Variables variables are presented in the same order as that in which they appear in the table of data To sort them alphabetically select the option Alphabetic sort presented beneath each of the variables list Click the button gt located on the left of the screen secti
298. speeds up the writing process of the model outputs 7 1 1 1 55 item A component of an association rule 7 1 1 1 56 itemset A group or a Set of items is called an itemset 7 1 1 1 57 iteration An iteration is a single loop through a cycle such as the design prototype test cycle SAP Infinitelnsight 6 5 SP4 CUSTOMER Glossary 2013 SAP AG or an SAP affiliate company All rights reserved 269 7 1 1 1 58 KL Kullback Leibler The Kullback Leibler divergence is used to measure the difference between the cluster profile and the population profile of the variables 7 1 1 1 59 KPI Key Performance Indicator KPls or key performance indicators help organizations achieve organizational goals through the definition and measurement of progress The key indicators are agreed upon by an organization and are indicators which can be measured that will reflect success factors The KPIs selected must reflect the organization s goals they must be key to its success and they must be measurable 7 1 1 1 60 K S test K S is the Kolmogorov Smirnov statistic applied here as a measure of deviation from uniform response rates across categories of a variable Kolmogorov Smirnov is a non parametric exact goodness of fit statistic based on the maximum deviation between the cumulative and empirical distribution functions L 7 1 1 1 61 Lift The Lift of a rule is a measure that indicates the chances of finding the consequent by using the ant
299. ster Frequencies plot The proportion of each cluster relative to the target variable Target Means and Relative Target Means plots SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 237 6 2 3 5 1 Displaying Cluster Plots M To Display Cluster Plots 1 Onthe screen Using the Model click Clusters Summary The panel Clusters Summary will appear KXEN InfiniteInsight class_Census01 ally Clusters Summary ASAaASHR O Plot Relative Target Means bd IV Descending Sort Relative Target Means Data Set Estimation E E 6 a a a Clusters m Relative Target Means te 2 Inthe Plot list select the type of plot that you want to display Note Select the option Descending sort to sort the plot bars in descending order For instance on the plot Relative Target Means the descending sort allows quick examination of the most interesting clusters that is those which differ most from the mean behavior of the data set taken as a whole SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 238 6 2 3 5 2 Understanding Cluster Plots 6 2 3 5 2 1 The Plot Target Means The Target Means plot presents the proportion of observations belonging to the target category of the target variable pres
300. t a folder to save the generated file 5 Inthe field Generated File enter the name of the exported file If you want to replace an existing file use the Browse button to select it Data Source Selection x Select Source Folder for Data deni aah ae at Samples Census I JapaneseData a KAR CO kelData H G KSN w G KTC H E KTS H C KxJavaCode HG c H E 3 A K ise ortiz caso Documents d oa g 7 Samples Text Files dat data csv txt v User Password A La SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 179 6 Ifyou have selected the option View Generated Code it is displayed at the end of the generation process 7 Click the Generate button The figure below shows the beginning of a sample C source code of a KXEN model KXEN InfiniteInsight class_Census01 amp View Source Code K2Rscript c tatic char KxVar0Cat0 19 77 97 114 114 105 101 100 45 99 105 118 45 115 112 111 117 115 101 0 Married civ static char KxVar0Cat1 14 78 101 118 101 114 45 109 97 114 114 105 101 100 0 Never married static char KxVar0Cat2 9 68 105 118 111 114 99 101 100 0 Divorced static char KxVar0Cat3 10 83 101 112 97 114
301. t frequent category of the target variable key category in the cluster containing the current observation displayed in the column kc_ lt TargetVariable gt _Mean 6 2 4 1 4 Types of Results Available The application of a model to a data set allows you to obtain three types of results The cluster index for each observation The disjunctive encoding or dummy coding of the cluster indexes which means that for each cluster a boolean variable is created indicating whether the current observation belongs to that cluster or not For a given observation the value 1 is assigned to the variable corresponding to the cluster containing the observation and the value O is assigned to the variables corresponding to the other clusters The variable names are built according to the following pattern KC_ lt TARGETNAME gt _ lt CLUSTERINDEX gt Consider as an example that you have generated a five clusters model When applying this model Infinitelnsight creates five variables corresponding to the five generated clusters For an observation belonging to cluster 3 the result appears as shown below KxIndex class kc_class ke class_1 ke_class_2 ko class_3 ke_class_4 ke_class_5 15 1 3 0 0 I 0 0 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 257 The target mean for each cluster that is the percentag
302. t is nominal Compare the predicted value to the actual value when the target is continuous On the plot for each type of model the curves represent when the target is nominal the realizable profit on the Y axis as a function of the ratio of the observations correctly selected as targets relative to the entire initial data set on the X axis when the target is continuous the predicted value or score on the X axis in respect with the actual value or target on the Y axis SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 114 5 2 3 3 3 Plot Options Mi To Display the Graphs for the Estimation Validation and Test Sub sets Two buttons situated above the plot on the screen Model Curves allow you to switch between the graph for the Validation sub set ins the graphs for all the sub sets 29 Vi To Copy the Model Graph Click the A Copy button The application copies the parameters of the plot You can paste it into a spreadsheet program such as Excel and use it to generate a graph SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 115 M To Save the Model Graph 1 Click the LJ Save button A dialog box will appear allowing you to select the file properties 2 Typeaname for your file 3
303. tal gain occupation education num capital loss rrr J E education hours per week Variables native country fniwgt relationship workclass Previous The plot above corresponds to the model generated and illustrates the two variables that contribute the most to the target variable which in this scenario are a marital status capital gain In other words the marital status and capital gain variables are those which have the greatest effect on whether a prospect will respond positively or negatively to your marketing campaign Among all the variables included in the sample data set these two are the most discriminatory variables with respect to the target variable Class SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 123 5 2 3 4 4 Correlated Variables To say that variables are correlated implies a certain level of redundancy that they each contribute some of the same information with respect to the target variable Two variables said to be highly correlated would describe the same information or the same concept to an even greater degree The plot Smart Variable Contributions reflects the correlation that may exist between various explanatory variables When two variables A and B are strongly correlated Variable A with a greater contribution than B with respect to the target vari
304. target variables It allows one to predict and explain phenomena or to describe them To quote George E P Box A Models Are Wrong But Some Are Useful Quote from Robustness is the Strategy of Scientific Model Building in Robustness in Statistics eds R L Launer and G N Wilkinson 1979 Academic Press SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 34 C ee a a S 4 7 2Performance of a Model A model that has satisfactory performance is one that possesses both High explanatory power that is sufficient capacity to explain the target variable This explanatory power is indicated by the quality indicator KI High robustness that is sufficient capacity to repeat the same performance on new data sets containing observations of a similar nature to the training data set This explanatory power is indicated by the robustness indicator KR 4 7 3 Types of Models In Data Mining there are two types of models Predictive and explanatory models which allow one to predict and explain phenomena Descriptive models which allow one to describe data sets With the nfinitelnsigh you can generate models that are both highly descriptive and highly predictive 4 7 4 Generating a Model Infinitelnsight models are generated during a phase called the training phase using a training data set Depending on the situation the training data set may be
305. te and time datetime or date date Type continuous nominal ordinal or textual For more information on data description go to Types of Variables on page 26 and Storage Formats on page 29 6 2 1 2 2 How to Describe Selected Data To describe your data you can Either use an existing description file that is taken from your information system or a previously created description file from KXEN features Or create a description file using the Analyze option from nfinite nsight In this case you must validate the description file obtained You can save this file for later use Important The description file obtained using the Analyze option results from the analysis of the first 50 lines of the initial data file In order to avoid all bias we encourage you to randomly sort your data set outside nfinite nsight before performing this analysis SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 203 6 2 1 2 3 Viewing the Data To help you validate the description when using the Analyze option you can display the first hundred lines of your data set M To View the Data 1 Click the button View Data A new window opens displaying the data set top lines lt KXEN Sample Data View gq Data Set Census01 csv Data Statistics B Graph State gov 77516 Bachelors Never married Adm c Self emp no
306. th nfinite nsight This documentation covers The operational use of nfinite nsight features The architecture and integration of the nfinite nsight API The nfinite nsight Java graphical user interface 2 2 1 2 2 Contextual Help Each screen in the nfinite nsight is accompanied by contextual help that describes the options presented to you and the concepts required for their application Mi To Display the Contextual Help 1 Click the Help button located on the screen lower left corner 2 Click the Previous button to go back to the original screen 2 2 2 Contact Us We are interested in your feedback and welcome your questions and comments The following table provides alist of e mail addresses that you may use to contact us If you Contact our team in Send an email to the following address Want more business application information Marketing info kxen com Have technical questions related to the integration and Support support kxen com use of our products Have comments or questions concerning the Documentation documentation kxen com documentation SAP Infinitelnsight 6 5 SP4 CUSTOMER Welcome to this Guide 2013 SAP AG or an SAP affiliate company All rights reserved 9 3 SAP Infinitelnsight IN THIS CHAPTER IMTHOGUCTION site ccc pieewe hed ee Gp edad hp etre le edged danse ieee aed fh decreed eee eee es 10 Architecture and Operations cccccccceeseeeeeeeeeeeeseaeeeseeecsaeeeaneceseeeeaeese
307. th the target variable E For this Scenario Exclude the variable Kx ndex as this is a key variable Since the initial data set does not contain a key variable nfinitelnsight feature generated KxIndex automatically Retain all the other variables Mi To Exclude some Variables from Data Analysis 1 Onthe screen Selecting Variables in the section Explanatory Variables Selected left hand side select the variable to be excluded M lt KXEN InfiniteInsight New Regression Classification Model Selecting Variables Explanatory Variables Selected Target Variables gt dass lt T Alphabetic Sort Weight Variable gt Exduded Variables KxIndex Number of Variables 14 q5 ey 7 Alphabetic Sort ba I Alphabetic Sort Chine elie ibe welts Note On the screen Selecting Variables variables are presented in the same order as that in which they appear in the table of data To sort them alphabetically select the option Alphabetic sort presented beneath each of the variables list 2 Click the button gt located on the left of the screen section Variables excluded lower right hand side The variable moves to the screen section Variables excluded Also select a variable in the screen section Variables excluded and click the button lt to move the variables back to the screen section Explanatory variables selected Note By default any variable defined as a key is put in the Excluded Variables
308. the Weight Quantum option Weight Quantum t Keep all Correlations Keep the First E Enable post processing Original target encoding Uniform target encoding Target Key Settings Target o gt oF SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 96 The notion of Weight Quantum has been added to define a threshold below which a category will be associated with KxOther The Statistical Reports now include the information about weights in the Descriptive Statistics Variables and Data Set Size MV To define a Weight Quantum 1 Check the box Weight Quantum 2 Enter a Threshold By default it is set to 1 Advanced Model Parameters Weight Quantum V ions Settings Higher than i Keep all Correlations Keep the First F Enable post processing Original target encoding Uniform target encoding Target Key Settings Target Cancel Previous dun The section Maximum Number of Kept Correlations allows you to set the parameters for the Correlation debriefing panel That is to select how many correlations should be displayed in that panel To say that variables are correlated implies that they each contribute some of the same information with respect to the target variable A correlation
309. the application data set must have been previously generated from a training data set The application data set must contain exactly the same information structure as the corresponding training data set that is The same number of variables The same types of variables The same order of presentation of these variables Important The application data set must contain a target variable that corresponds to that of the training data set This point is true for all instances even if the values of this target variable are empty When these values are defined they may serve to detect any possible deviant observations outliers SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 18 4 4 Cutting Strategies 4 4 1Definition A cutting strategy is a technique that allows decomposition of a training data set into three distinct sub sets An estimation sub set A validation sub set A test sub set This cutting allows for cross validation of the models generated There are nine types of cutting strategies available within nfinite nsight 4 4 2 Roles of the Three Sub Sets The following table defines the roles of the three data sub sets obtained using cutting strategies The data set _ Is used to Estimation Generate different models The models generated at this stage are hypothetical Validation Select the best model among those generated using th
310. the list but also from the data source SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 62 MI To Edit the General Settings Settings Options Note Reports Background Color choose a color Only the PDF and HTML formats can display a background color make transparent Edit Configuration font size Check the option Dynamically render option changes or font style click Apply when editing the settings so that you can visualize the font color result text background color table configuration The selected settings will be applied to both the wizard and the generated reports V1 To Edit the Charts Settings Settings Options Note Chart Colors modify the charts colors Default Chart Bars Orientation horizontal It is possible to set another default vertical orientation for specific report items M To Edit Report Items 1 Set the properties of your choice 2 Click Save to validate A window opens indicating that your style sheet has been successfully saved 3 Click OK Properties Functions Note Displayed as name of the label View Type choose between Tabular HTML and Graphical The last one is only available if the report item can be displayed as a graph Chart Type select one of the proposed chart types This option is only availab Graphical Switch Bar Orientation this option allows h
311. the option Define Structure gt Extract Categories from Statistics A progression bar is displayed while the structure is being extracted Once the extraction is done the icons corresponding to the selected variables change indicating that the operation was a success and allowing you to easily identify them You can then modify the variable structure as you need Mi To Import the Variable Structure from a Model 4 Select the variables for which you want to extract the structure 5 Right click the table a contextual menu is displayed 6 Select the option Define Structure gt Extract Structure from Model The window Loading Model opens Data Type Folder Wsers denise ortiz caso Documents Models x og Browse __Date Comment 2R Census Class Kxen Simple 2004 12 16 IK2S Census Class Kxen Simple 2004 12 16 2R_census_3 3 0 Kxen Simple 2005 09 05 K2S Census 3 3 0 Kxen Simple 2005 09 05 2R_Census_331 Kxen Simple 2005 09 08 Kxen Simple 2005 09 08 R Kxen Simple 2005 09 13 K2S_Unsupervize Kxen Simple 2005 09 20 2R Census Age Kxen Simple 2005 09 21 K2R gt sex gt Cen Kxen Simple 2005 09 23 3 3 1 gt census gt Kxen Simple 2005 10 17 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 74
312. the threshold that will give you a maximum profit for the profit parameters you have set click the button Maximize Profit SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 141 In the following profit cost table each positive observation correctly identified will yield 15 but each negative observation identified as positive will cost you 8 Category Predicted 1 Predicted O True 1 15 O True O 8 O 5 2 3 10 Decision Tree The panel Decision Tree allows you to display the results of a regression or classification model generated by Infinitelnsight Modeler as a decision tree based on the five most contributive variables SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 142 5 2 3 10 1 Displaying the Decision Tree M To Display the Decision Tree for a Target 1 Inthe Target list select the target for which you want to display the decision tree aly Decision Tree Target dass i ag Save as Image Display Settings Whole Population Population 48842 Positive Target BE 03 marital status marital status marital status Never married Separ Divorced Married sp Married AF spouse M Population 17647 Population 8779 Population 22416 Positive Target l 4 71 Positive Target I 9 76 Positiv
313. ther KxMissing KxOther ry Local gov Self emp not inc Local gov Self emp not inc P Private State gov Private Self emp inc Add New Group Add Category New Category Add Missing T Alphabetic Sort Remove Group IV Enable the target based optimal grouping performed by K2C Lo f ee If the variable has already been added to the list of variables located on the lower part of the panel Double click the variable for which you want to see the structure defined in the model SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 76 MI To Create or Modify a Variable Structure Double click the Structure icon corresponding to the variable for which you want to edit or create the structure The edit window opens If the structure had been extracted from the variable statistics or from a model the fields are already filled lt relationship xi Group Structure Category Edition Husband Not n family ry Other relative Own child Other relative Own child g p Unmarried Unmarried New Category Remove Group Add Missing T Alphabetic Sort JV Enable the target based optimal grouping performed by K2C OK Cancel For details on how to use the Structure editor see Structure by Type of Variables SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler
314. thout any key for the described space When the space used for model training does not contain a physical variable named Kx Index it is not possible to use a description file including a description about a KxIndex variable since it does not exist in current space Click the OK button The window Load a Description closes and the description is displayed on the screen Data Description lt KXEN InfiniteInsight New Regression Classification Model ee Description Desc_Census01 csv Name Missing Group Descrip Structure ljage ji 2 workdass 3 fniwgt 4education 5leducation num 6 marital status 7 occupation 8 relationship Qrace 10 sex 11 capital gain 12 capital4oss 13 hours per week native country S SSlS S o os ofolssojojojaja S S oSosfo ojal oojassjojajaya dass KxIndex Add Filter in Data Set pp Analyze LD open Description bed seve Description l 6 Click the Next button SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 202 6 2 1 2 1 Why Describe the Data Selected In order for KXEN features to interpret and analyze your data they must be described To put it another way the description file must specify the nature of each variable determining their Storage format number number character string string da
315. ting the option Predicted value only Probability Individual Contributions Decision M For this Scenario Will generate a results file containing the following information Only the predicted value of observations rr_TargetVariableName the predicted value the probability proba_rr_TargetVariableName the prediction range bar_rr_TargetVariableName the predicted value the probability the prediction range the individual contributions of variables contrib_VariableName_rr_TargetVariableName the predicted value the decision decision_rr_TargetVariableName the decision probability proba_decision_rr_TargetVariableName the probability Due to technical constraints a data set corresponding to the database of 1 000 000 customers that will be used in this scenario can not be provided to you SAP Infinitelnsight 6 5 SP4 Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved CUSTOMER 168 You will apply the model to the file CensusOl csv which you used to generate the model In this manner you will be able to compare the predictions provided by the model to the real values of the target variable Class for each of the observations In the procedure To Apply the Model to a New Data Set Select the format Text files Inthe Generate field select the option Individual Contributions Select the folder of your choice in
316. tion contained in the target variable that the explanatory variables are able to explain SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 38 4 8 1 1 2 Example A model with a KI equal to 0 79 is capable of explaining 79 of the information contained in the target variable using the explanatory variables contained in the data set analyzed Ll is a hypothetical perfect model capable of explaining 100 of the target variable using the explanatory variables contained in the data set analyzed In practice such a KI would generally indicate that an explanatory variable 100 correlated with the target variable was not excluded from the data set analyzed OQ is a purely random model 4 8 1 1 3 Improving the KI of a Model To improve the KI of a model new variables may be added to the training data set Explanatory variables may also be combined 4 8 1 2 Robustness Indicator KR 4 8 1 2 1 Definition KR is the abbreviation for KXEN Robustness Indicator The KR indicator is the robustness indicator of the models generated using nfinite nsight It indicates the capacity of the model to achieve the same performance when it is applied to a new data set exhibiting the same characteristics as the training data set 4 8 1 2 2 Example A model with a KR Equal to or greater than 0 98 is very robust It has a high capacity for generalization Less th
317. tion models becomes increasingly more significant Finally management may want you to rationalize your methods and to perform your segmentation using a method not based purely on your intuition Defending your segmentation method based on intuition may be difficult SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 192 6 1 5 2 Classical Statistical Method On the basis of the information that you have a Data Mining expert could build a segmentation model In other words you could ask a statistical expert to create a mathematical model that would allow you to build clusters based on the profiles of your customers To implement this method the statistician must Perform a detailed analysis of your database Prepare your database down to the smallest detail specifically encoding the variables as a function of their type nominal ordinal or continuous in preparation for segmentation The encoding strategy used will determine the type of segmentation model obtained At this step the statistician may unconsciously bias the results Test different types of algorithms K means both ascending and descending hierarchical segmentation models and select the one best suited to your business issue Evaluate the relevance of the clusters obtained in particular the response to your domain specific business issue After a few wee
318. tion of prospects who responded in a positive manner to your marketing campaign during your test phase For these prospects the value of the target variable or profit is equal tol 25 of the observations from your initial data set with the help of the model generated 66 9 of the observations belonging to the target category of the target variable are selected 25 of the initial data set using a random model 25 belonging to the target category of the target variable are selected 6 2 3 3 5 KI KR and Model Curves On the model curve plot Ofthe estimation data set default plot the KI indicator corresponds to the area found between the curve of the model generated and that of the random model divided by the area found between the curve of the perfect model and that of the random model As the curve of the generated model approaches the curve of the perfect model the value of KI approaches 1 Ofthe estimation validation and test data sets select the corresponding option from the list Data set located below the plot the KR indicator corresponds to one minus the area found between the curve of the estimation data set and that of the validation data set divided by the area found between the curve of the perfect model and that of the random model SAP Infinitelnsight 6 5 SP4 Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved CUSTOMER 230 6 2 3 4
319. trator there are two possible outcomes to a query the query is accepted and executed this is completely transparent nfinite nsight accesses the data without further input from the user the query needs to be validated before being executed a pop up window opens displaying a message configured by the DBMS administrator A query that needs validation can be categorized in two ways O medium sized You will probably have to check with your administrator to know which action to take If the administrator authorizes the query click the Continue button The pop up window closes and the requested action is carried out f the administrator does not authorize the query click the button Stop Query the pop up window closes but no action is executed e huge It means that the query will take too much time and resources In that case the behavior of the Continue button depends on the configuration set by the DBMS Administrator for example it can automatically refuse queries that are considered too heavy In any case you should check with them to know the line of action to follow If your DBMS Administrator has not configured the Explain mode the following pop up opens when you try to access the data X Query Validation Evaluation The Explain feature is not configured Your validation will be required for all queries E No longer request validation for similar queries Show Details
320. type of profit Using this type of profit The value 0 is assigned to observations that do not belong to the target category of the target variable The value 1 frequency of the target variable in the data set is assigned to observations that do belong to the target category of the target variable The following table describes the three curves represented on the plot created using the default parameters The curve Wizard green curve at the top Validation blue curve in the middle Random red curve at the bottom Represents The profit that may be achieved using the hypothetical perfect model that allows one to know with absolute confidence the value of the target variable for each observation of the data set The profit that may be achieved using the model generated by Infinitelnsight Modeler Regression Classification that allows one to perform the best possible prediction of the value of the target variable for each observation of the data set The profit that may be achieved using a random model that does not allow one to know even a single value of the target variable for each observation of the data set For instance by selecting 25 of the observations from your entire data set with the help of a perfect model 100 of observations belonging to the target category of the target variable are selected Thus maximum profit is achieved Note These 25 correspond to the propor
321. us allows you to assign observations to clusters This part presents the option Applying the model to a new data set for the nfinite nsight Modeler Segmentation Clustering feature The other options for deployment of the clustering models are similar to those proposed for models generated using the nfinite nsight Modeler Regression Classification feature For more information about these options see Saving the Model Opening the Model 6 2 4 1 Applying the Model to a New Data Set The currently open model may be applied to additional data sets The model allows you to determine to which cluster the observations described in these data sets belong 6 2 4 1 1 Constraints of Model Use In order to apply a model to a data set the format of the application data set must be identical to that of the training data set used to generate the model The same target variable in particular must be included in both data sets even if values for the target variable are not contained in the application data set SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 252 6 2 4 1 2 Using the Option Direct Apply in the Database This optimized scoring mode can be used if all the following conditions are met the apply in data set table view select statement data manipulation and the results data set are tables coming from the same database
322. uster selection table 3 Check the clusters for which you want to add the probabilities Note When the SQL mode is activated the notion of nearest cluster does not exist If a case belongs to a cluster probability is set to 1 If a case does not belong to a cluster probability is set to 0 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Segmentation Clustering 2013 SAP AG or an SAP affiliate company All rights reserved 256 6 2 4 1 3 6 1 Disjunctive Coding This option allows you to add to the output file the disjunctive coding of the clusters A column is generated for each cluster and contains either O or 1 depending whether the observation belongs to the cluster or not The columns created are named kc_disj_ lt TargetVariable gt _ lt Clusterld gt For example if the target variable is Age and the model has five clusters the five following columns will be generated Ac_ais _age_1 kc_diis _age_2 kc_disj_age_3 kc_disj_age_4 kc_disj_age_5 6 2 4 1 3 6 2 Target Mean Target Key Probability This option allows you to add to the output file for continuous targets the mean of the target for the cluster containing the observation displayed in the column kc_ lt TargetVariable gt _Mean the difference with the actual target value if the latter is known for the current observation displayed in the column kc_ lt TargetVariable gt _Error for nominal targets the proportion of the leas
323. variable moves to the screen section Target s Variable s Also select a variable in the screen section Target s Variable s and click the button lt to move the variables back to the screen section Explanatory variables selected SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 90 5 2 1 5 2 Weight Variable Selecting a Weight Variable enables to set the Weight Quantum option available in the Advanced Model Parameters E For this Scenario Do not select a weight variable M To Select a Weight Variable 1 Onthe screen Selecting Variables in the section Explanatory variables selected left hand side select the variables you want to use as a Weight Variable lt KXEN InfiniteInsight New Regression Classification Model Selecting Variables Explanatory Variables Selected Target Variables gt dass lt I Alphabetic Sort Weight Variable Exduded Variables gt dex lt Number of Variables 14 q3 A 7 Alphabetic Sort H OT Alphabetic Sort Note On the screen Selecting Variables variables are presented in the same order as that in which they appear in the table of data To sort them alphabetically select the option Alphabetic sort presented beneath each of the variables list 2 Click the button gt located on the left of the screen section Weight Variable middle right hand side The variable m
324. variables visible in the drop down list are sorted according to the difference between their cluster profile and their population profile the Kullback Leibler divergence is used to measure this difference The variable that appears first on the list is the variable exhibiting the greatest difference between its two profiles This sorted list of variables provides the set of discriminatory variables required to describe a cluster Inthe middle part a table presents each cluster in a summarized fashion Column Indicates For instance Cluster The name of the cluster Cluster 2 Name Frequencies The number of observations contained in The customers contained in cluster 2 represent 11 2 of the total the cluster relative to the total number of number of customers contained in your entire training data set observations contained in the data set of 1 The proportion of observations 65 7 of the customers contained in cluster 2 belong to the target contained in the cluster belonging to the category of the target variable Class In other words 65 7 of the target category of the target variable customers contained in this cluster responded in a positive manner to the test phase of your marketing campaign This table allows you to select the cluster for which you want to view cross statistics Inthe lower part a plot presents either the cross statistics corresponding to the cluster and the variables selected or when it has
325. vate 109015 HS grad Divorced Tech s gt First Row Index 4 Last Row Index wa E aJ 2 Inthe field First Row Index enter the number of the first row you want to display 3 Inthe fieldLast Row Index enter the number of the last row you want to display 4 Click the Refresh button to see the selected rows 5 2 1 2 4 A Comment about Database Keys For data and performance management purposes the data set to be analyzed must contain a variable that serves as a key variable Two cases should be considered If the initial data set does not contain a key variable a variable index Kx ndex is automatically generated by nfinite nsighf features This will correspond to the row number of the processed data f the file contains one or more key variables they are not recognized automatically You must specify them manually in the data description See the procedure To Specify that a Variable is a Key page 72 SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 71 5 2 1 2 5 To Specify that a Variable is a Key 1 Inthe Key column click the box corresponding to the row of the key variable 2 Typein the value 1 to define this as a key variable lt KXEN Infinite Insight 1 New Model with Sequence Analysis x Gues
326. ved 31 4 6 5 1 3 Example Your company is marketing two products A and B You have a database which contains references to 1 500 of your customers You know which product A or B each customer has purchased 10 000 prospects You want to know which product each customer is likely to purchase The variable product purchased is your target variable it corresponds to your business issue It is Known for all values of the training data set in our example the customers Not known for the values of the application data set in our example the prospects Intinitelnsight features allow you to model that target variable and thus predict which product each of your prospects is likely to purchase The following table represents your database Name Age Residence Socio Occupational Category Product Purchased Charles 34 New Orleans Manager Administrator Product A John 37 Washington Manager Administrator Product A Marl ne 31 Boston Civil servant Product B Prospect 1 34 Oakland Manager Administrator Prospect 2 24 Washington Civil servant Prospect n 35 Sacramento Skilled tradesman 4 6 5 1 4 Constraints Governing Use The following constraints govern the use of a target variable Within a training data set all target variable values must be known Only binary or continuous variables may be used as target variables 4 6 5 2 Explanatory Variable 4 6 5 2 1 Definition An explanatory variable is a variabl
327. vides an overview of nfinite nsight its architecture and its operation In addition it presents two indispensable prerequisite methodologies for using nfinite nsighf features Chapter 3 Essential Concepts introduces the concepts essential to modeling data with the nfinite nsight The shorter Chapter 4 General Introduction to Scenarios provides a summary of the application scenarios for features nfinitelnsight Modeler Regression Classification and nfinitelnsight Modeler Segmentation Clustering It also introduces the user interface and the data files used in these scenarios Chapters 5 and 6 Generating Explanatory and Predictive Models with nfinite nsight Modeler Regression Classification feature and Generating Descriptive Models with nfinite nsight Modeler Segmentation Clustering feature present the nfinite nsight Modeler Regression Classification feature and the nfinitelnsight Modeler Segmentation Clustering feature of nfinite nsight respectively These two chapters are organized in the same manner in two parts The first part introduces a detailed application scenario of that feature The second part introduces the actual use of the feature itself using the illustration of the corresponding application scenario A summary and detailed table of contents located at the beginning of the guide an index located at the end and cross references throughout the document allow you to find the information that you nee
328. w2 X2 wn Xn A polynomial of degree 2 is of the form FAT XP at XS wO wl X1 w2 X2 1 wn Xn wl1X1 X1 wl2 X1 X2 wl3 X1 X3 wij Xi Xj 4 7 5 3 Methodology In the large majority of cases a first degree polynomial is sufficient for generation of a relevant and robust model Using a higher degree of polynomial does not always guarantee better results than those obtained with a first degree polynomial In addition the higher the degree of polynomial you select The more time needed to generate the corresponding model The more time needed to apply the model to new data sets The harder it is to interpret the results of modeling The selection of the degree of the polynomial depends on the nature of the data to be analyzed The recommended method is to First generate a model with a first order model In the large majority of cases this degree will be sufficient to guarantee a relevant and robust model Test the results thus obtained with models of greater degree if the performance of the first order model seems inadequate SAP Infinitelnsight 6 5 SP4 CUSTOMER Essential Concepts 2013 SAP AG or an SAP affiliate company All rights reserved 36 4 7 6 Validating the Model Once the model has been generated you must verify its validity by examining the performance indicators The quality indicator KI allows you to evaluate the explanatory power of the model that is its capacity to exp
329. with the model This name will then appear in the list of models to be Name offered when you open an existing model Description This field allows you to enter the information you want such as the name of the training data set used the polynomial degree or the KI and KR performance indicators obtained This information could be useful to you later for identifying your model Note This description will be used instead of the one entered in the panel Summary of Modeling Parameters Data Type this list allows you to select the type of storage in which you want to save your model The following options are available Text files to save the model in a text file Database to save the model in a database Flat Memory to save the model in the active memory SAS Files to save the model in a SAS compatible file for a specified version of SAS and a specified platform SAS v6 or 7 8 for Windows or UNIX SAS Transport to save the model in a generic SAS compatible file Folder Depending upon which option you selected this field allows you to specify the ODBC source the memory store or the folder in which you want to save the model File Table This field allows you to enter the name of the file or table that is to contain the model The name of the file must contain one of the following format extensions txt text file in which the data is separated by tabs or csv text file in which the data is separated by commas Click the OK
330. ws you to Determine which individuals have the highest probability score of being interested in your marketing campaign predictive modeling You may then apply the model to your entire database Break out the determining factors that describe the phenomenon that you hope to model that is the fact of being interested or not interested in the new financial product of the bank descriptive modeling The profit curve an important validation and control tool allows you to compare the performance of models generated using nfinite nsight features with that of a hypothetical random model and that of a hypothetical perfect model At the same time it also allows you to determine the optimal number of persons to contact to maximize the profit generated by your campaign nfinite nsight also provides you with indicators of the quality Kl of the model you generate and its capacity to generalize and remain relevant to new data sets KR Infinitelnsight provides you with the means to customize your direct marketing campaign with respect to your different customer profiles increasing your powers of persuasion SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 58 5 1 7 Introduction to Sample Files This guide is accompanied by the following sample data files A data file CensusOl csv The corresponding description file des
331. y 3D Chart Disable Double Buffering Optimize for Remote Display Remember Size and Position when Leaving Report Number of Variables of Interest Active Style Sheet Customize Style Sheets SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 61 5 1 8 1 1 Customizing Style Sheets Intinitelnsight offers the possibility to customize the generated reports The default style sheet called KXEN Report Style Sheet default cannot be modified You have to create your own style sheets to modify the settings Note To create load or save a style sheet you have to indicate a data source in the panel Edit Options before opening the window KXEN Report Style Sheet Editor M To Create a New Style Sheet 1 2 In the field Folder click the button g Browse Select a folder This folder is your style sheets repository Click the button Ada Anew style sheet has been created Click the button LE The panel Report Style Sheet Editor opens In the field Style Sheet Name enter a name for the new style sheet The extension KRS is automatically added Note You can duplicate a style sheet by changing the name of your style sheet The previous one is not deleted Mi To Delete a Style Sheet 6 Select one of the displayed style sheets 7 Clickthe button Remove Note The style sheet is not only deleted from
332. ying the reasons which led certain individuals to respond favorably to your offer and others to respond in the negative Then you would be able to use the analytical model obtained to predict the behavior of each of the 1 000 000 prospects in your database This would ensure that you optimized your marketing campaign by making the offer only to those individuals most likely to be interested The file containing the data set used for the test was sent to you by the Information Technology department of the bank in the form of a flat file csv This file corresponds to the sample file CensusOl csv provided with Intinitelnsight and described in the Introduction to Sample Files page 59 section SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 55 5 1 5 Your Business Issue Following the test phase of your campaign your marketing database will contain A list of 1 000 000 prospects A list of 50 000 prospects selected in a random manner during the test phase and whose response to your test campaign is known This sample taken from your initial database also contains missing values and correlated variables Your approach to the business issue consists of using the data set in its present state as a training data set in order to Rapidly create an explanatory and predictive model Next apply this model to the entire databa
333. z Choose a variable in the first list Choose an operator in the second list Indicate a value in the third list Fora variable with number storage type a value ou A SAP Infinitelnsight 6 5 SP4 CUSTOMER Infinitelnsight Modeler Regression Classification 2013 SAP AG or an SAP affiliate company All rights reserved 86 Fora variable with string storage choose a variable in the list If the list is empty click the button k to extract the variable categories 7 Click OK Note You can edit a condition by double clicking it VI To Add a Logical Conjunction Click the button Add Logic And or the button Add Logic Or Note You can change a conjunction by double clicking it Vi To Change the Order You can change the order of the nodes to accelerate the filtering process by setting the conditions with the highest probability to be false at the top of the list 1 Select the node you want to move up or down Use the buttons A and b to change its position in the filter To Delete a Node Select the node you want to delete Click the button Remove Selected Node N Ferg YX To Display the Filtered Data Set You can visualize the data set that you will obtain after the application of the filter Click the button View Data A pop up window opens PRKXEN Sample Dataview TST Data Set Census01 csv Tata statistics BZ craph 38 Private 53Private Private Self emp no Private

User Guide

Contents

Download Pdf Manuals

Related Search

Related Contents