Home

Latent GOLD Choice 4.0 User`s Manual

1. SPSS Proce 74 The universe is assumed to consist of 3 segments with different utility preferences The first segment constitutes 50 of the Universe the second and third each representing 25 125 Segment 1 2 3 Segment Size 0 50 0 25 0 25 Part worth Utility Parameters Segment 1 2 3 Modern fashion 3 0 1 Higher quality 0 3 1 PRICE 0 5 0 5 0 5 NONE 1 0 0 5 These utilities imply that all 3 segments are equally price sensitive In addition segment 1 50 of the universe prefers the more Modern style shoes and is indifferent regarding Higher quality segment 2 25 prefers the Higher quality and is indifferent with regards to fashion Segment 3 25 is influenced by both Modern fashion and Higher quality but to a lessor extent than the other segments Overall segment 1 is most likely and segment 3 least likely to choose None The segments differ on AGE and GENDER as follows Gender and Age distributions for each segment Segment GENDER 1 2 3 Male 0 25 0 40 0 83 Female 0 75 0 60 0 17 AGE lt 25 0 69 0 28 0 19 25 39 0 14 0 15 0 43 40 0 20 0 57 0 38 The choices obtained from 400 individuals generated from this hypothetical Universe are contained in the Response File cocRESP sav 126 Figure 16 First 2 cases in the Response File cbcRESP sav CbcRESP sav SPSS Data Editor of x File Edit View Data Transform Analyze Graphs Utilities Amos4pps Window Help
2. 87 11 2 Advaneed Models gt o aca ee eb he 0 2 nat das 88 Part 1 Basic Model Options Technical Settings and Output Sections 1 Introduction to Part I Basic Models Latent GOLD Choice is a program for the analysis of various types of pref erence data that is data containing partial information on respondents preferences concerning one or more sets of alternatives objects options or products Such data can be obtained from different response formats the most important of which are e first choice out of a set of M alternatives possibly including a none option e paired comparison e full ranking of a set of M alternatives e partial ranking of a set of M alternatives e best and worst choices out of a set of M alternatives also referred to as maximum difference scaling e binary rating of a single alternative positive negative yes no like dislike e polytomous rating of a single alternative e g on a 5 point scale e assigning a probability to a set of M alternatives also referred to as constant sum data e distribution of a fixed number of points votes chips dollars etc among a set of M alternatives also referred to as an allocation format e pick any out of a set of M alternatives e pick k a prespecified number out of a set of M alternatives which is an example of what can be called a joint choice Latent GOLD Choice will accept each of these formats including
3. Female Female Female Female Female Female Female Female 2 1 3 2 1 3 2 2 3 1 1 1 4 2 3 4 o m Oo w Bo S 0 N m j Go Bo SPSS Processor isready The Goal We wish to assess the ability of the Latent GOLD Choice program to unmix the data correctly to reflect the simulated structural relationships Since the design for this choice experiment has relatively low efficiency it is unclear whether it is possible to uncover the 3 underlying segments The BIC statistic will be used to determine which model fits best Setting up the analysis For this example you can either open the setup file Iclass lgf saved previously or go through the setup steps one at a time gt To open the setup file from the menus choose File Open 127 gt From the Files of type drop down list select LatentGOLD files lgf if this is not already the default listing All files with the lgf extensions appear in the list see Figure below Note If you copied the sample data file to a directory other than the default directory change to that directory prior to retrieving the file Figure 17 Opening the data file Open 2 x Lookin GILG 3 0 gt fl Ale 1class lgf File name I class lof Files oftype LatentGOLD files Igf y Cancel Help Recent TT U Folders C Programs LG 3 01 Select Iclass lgf and click Open Double
4. gt i EXP Nm Jz where Nmjz 18 the systematic component in the utility of alternative m for case i at replication t The term 7m z is a linear function of an alternative specific constant 6p attribute effects 3 and predictor effects GPT Me Fadden 1974 That is P Q Mies Bn EDO E Dna San p 1 q 1 where for identification purposes X 8 0 and 5 hy O for 1 lt q lt Q a restriction that is known as effect coding It is also possible to use dummy coding using either the first or last category as reference category see subsection 2 8 Note that the regression parameters corresponding to the predictor effects contain a subscript m indicating that their values are alternative specific The inclusion of the alternative specific constant Pf is optional and models will often not include predictor effects Without alternative specific constants and without predictors the linear model for nmz simplifies to p imee 2 Bp Bimi p 1 In a latent class or finite mixture variant of the conditional model it is assumed that individuals belong to different latent classes that differ with respect to some of the parameters appearing in the linear model for 7 Kamakura and Russell 1989 In order to indicate that the choice proba bilities depend on class membership z the logistic model is now of the form re exP Ninja za P yit mega a 1 LA S AD iis 12 Here Mm 2 z 18 the systematic component in th
5. The Continuous Factors CFactors option makes it possible to specify random coefficients conditional logit models One may however also com bine CFactors and latent classes in a single model yielding LC Choice models in which the alternative specific constants predictor effects and or attribute effects may vary within latent classes The Multilevel Model option can be used to define LC Choice models for nested data such as employees nested within firms pupils nested within schools clients nested within stores patients nested within hospitals citizens nested within regions and repeated measurements nested within individu als Note that a LC Choice model is itself a model for two level data that is a model for multiple responses per case The multilevel LC Choice model is thus in fact a model for three level data that is for multiple responses nested within cases and cases nested within groups As in any multilevel anal ysis the basic idea of a multilevel LC Choice analysis is that one or more parameters of the model of interest is allowed to vary across groups using a random effects modeling approach In Latent GOLD Choice the group level 60 random effects can either be specified to be continuous group level contin uous factors GCFactors or discrete group level latent classes GClasses yielding either a parametric or a nonparametric approach respectively One variant of the multilevel LC model involves including gro
6. Attributes 3 class L FASHION 4 class L 1 modern Model5 L traditional res 3 class final Mean Paramete QUALITY Profle higher b standard E E E Figure 39 Final 3 class Model Profile Output The Profile output consists of the segment size estimates together with re scaled parameter estimates that correspond to column percentages Notice that the segment size estimates of 50 26 and 23 are very close to the true universe parameters These re scaled parameters for FASHION and QUALITY have a nice probabilistic interpretation The probability of choosing the Modern over the Traditional style when faced with just these 2 alternatives where the QUALITY and PRICE were identical is estimated to be 9571 for persons in segment 1 5 for persons in segment 2 and 7415 for persons in segment 3 A similar interpretation is available for QUALITY When faced with a choice between 1 a pair of Higher quality and 2 a pair of Standard quality shoes both having the same price and same style Modern or Traditional the probability of choosing the Higher quality shoes is 5 for segment 1 9423 for segment 2 and 7186 for segment 3 While the re scaled parameters for PRICE can not be interpreted in same way they still convey information about which segment s are more price sensitive A second benefit of the re scaled parameters is that when presented as row per
7. and Russell G J 1989 A probabilistic choice model for mar ket segmentation and elasticity structuring Journal of Marketing Research 26 379 390 Kamakura W A Wedel M and Agrawal J 1994 Concomitant variable latent class models for the external analysis of choice data International Journal of Research in Marketing 11 451 464 Laird N 1978 Nonparametric maximum likelihood estimation of a mixture distribution Journal of the American Statistical Association 73 805 811 Langeheine R Pannekoek J and Van de Pol F 1996 Bootstrapping goodness of fit measures in categorical data analysis Sociological Methods and Re search 24 492 516 Lenk P J and DeSarbo W S 2000 Bayesian inference for finite mixture models of generalized linear models with random effects Psychometrika 65 93 119 Lesaffre E and Spiessens B 2001 On the effect of the number of quadrature points in a logistic random effects model an example Applied Statistics 50 325 335 Little R J and Rubin D B 1987 Statistical analysis with missing data New York Wiley Louviere J J Hensker D A and Swait J D 2000 Stated choice methods analysis and application Cambridge Cambridge University Press 83 Magidson J 1981 Qualitative variance entropy and correlation ratios for nominal dependent variables Social Sciences Research 10 177 194 Magidson J 1996 Maximum likelihood assessment of clinical trial
8. see equation 22 Because a closed form expression for this integral is not available it must be solved using approximation methods La tent GOLD Choice approximates the conditional density P y z by means of Gauss Hermite numerical integration implying that the multidimensional integral is replaced by multiple sums Bock and Aitkin 1981 With three CFactors and B quadrature nodes per dimension the approximate density equals B a x z Py e Zi For Foz Fos E Pho Poz IMs Here Fp is the location and P the weight corresponding to quadrature node bg for CFactor d These nodes and weights are obtained from published quadrature tables Stroud and Secrest 1966 As can be seen because of the multiple sums this approximate density is very similar to the density of a LC model with multiple latent variables The above approximation also shows that given the fact that one will usually use at least 10 quadrature points per dimension Lessafre and Spiessens 2001 because of computation burden it does not make sense to have models with more than three CFactors Similar to what Latent GOLD Choice does for standard LC Choice mod els the ML PM estimation problem for models with CFactors is solved 63 using a combination of EM and Newton Raphson with analytic first and second order derivatives The only new technical setting in models with CFactors is the parame ter specifying the number of quadrature nodes to be used in
9. FY and zs and group level parameters by 72 39 and 9 The most general probability structure for a multilevel LC Choice model 66 is P y lzj 2 2 Ja ED PC Palas 2 E AB 25 where Ij P y z 2 ES 1 Py gilzs 27 E i 1 Assuming that the model of interest may also contain CFactors for each case i P yjilZji 29 F has a structure similar to the one described in equation 22 that is P y 31251 29 FF 5f A P a zji 29 FF Ply gio Zj Eji 2 FF dF where A Ply plz Zj Fy 09 F8 Plvjalz 254 zin E ji 29 FS 1 ll These four equations show that a multilevel LC Choice model is a model e for P y5 z z7 which is the marginal density of all responses in group j given all exogenous variable information in group J e containing GClasses 19 and or at most three mutually independent GCFactors F e containing GCovariates z affecting the group classes x9 e assuming that the J observations for the cases belonging to group j are independent of one another given the GClasses and GCFactors e allowing the GClasses and GCFactors to affect the case level latent classes x and or the responses y GCFactors enter in exactly the same manner in the linear term of the conditional logit model as case level CFactors We refer to their coefficients gt tt d as Amedo Arpa and Amarga GCFactors can also be used in the model for the Classes We will denote a GCFactor effect o
10. The LC model for the time points would be a LC Choice model The multinomial logistic regression model for the time specific latent classes will have the form of a LC growth model class membership depends on time where the intercept and possibly also the time slope is allowed to vary across individuals This variation can be modelled using continuous random effects GCFactors and or discrete random effects GClasses 7 2 5 Two step IRT applications Another application of the Latent GOLD Choice multilevel option is in IRT models for educational testing that assume a two stage response process Bechger et al 2005 Westers and Kelderman 1991 1993 These models associate a discrete usually binary latent response to each observed item response where a standard IRT model is specified for the discrete latent responses A specific mechanism is assumed for the relationships between the latent and observed item responses In Westers and Kelderman s SERE model for example the first latent class knows and the second latent class does not know the correct answer on a multiple choice item implying that 72 the first class gives the correct answer with probability one and the other class guesses with probabilities that depend on the attractiveness of the al ternatives Using the Latent GOLD Choice notation the SERE model would be defined as a Rasch like model for the latent classes R Maja FF Yao 5 Yar Zir A i Fi r 1 where t
11. fhoice Choice sti Case lD gt fe Nominal Nominal Figure 19 Model Analysis Dialog Box after setup To complete the setup we need to connect the Alternatives and Sets files to the Response File and specify those attributes for which utilities are to be estimated gt Open the Attributes tab by clicking on Attributes at the top of the setup screen 130 Choice Model CocRESP sav Modell Constants_ Figure 20 Attributes Tab gt Click the Alternatives button to display a list of files gt Select cbcALT11 sav and click Open In response to the prompt to select an ID variable gt Select PRODCODE and click OK Figure 21 Select ID Variable prompt Select ID Variable X PRODCODE _Constants_ 131 The attribute variables from this file are now included in the variable list along with the alternative specific constants variable _Constants_ which is generated automatically by the program gt Click the Choice Sets button to display a list of files gt Select cbcSET sav and click Open In response to the prompt to select an ID variable gt Select setid and click OK Selecting the Variables for the Analysis For this analysis we will estimate main effects for all 4 attributes FASHION QUALITY PRICE and NONE and a quadratic price effect by including the variable PRICESQ To select these variables gt Select all variables except for _Constants_ and click Attrib
12. is sometimes referred to as a nonparametric random coefficients approach Aitkin 1999 Laird 1978 Vermunt 1997 Vermunt and Van Dijk 2001 Vermunt and Magidson 2003 Latent GOLD Choice implements a non parametric variant of the random coefficient or mixed conditional logit model Andrews et al 2002 Louviere et al 2000 McFadden and Train 2000 The LC choice model can also be seen as a variant of the LC or mixture regression model Vermunt and Magidson 2000 Wedel and DeSarbo 1994 2002 Most studies will contain multiple observations or multiple replications per respondent e g respondents indicate their first choice for several sets of products or provide ratings for various products This introduces depen dence between observations It is this dependence caused by the repeated measures that makes it possible to obtain stable estimates of the class specific regression parameters A third aspect of the model implemented in Latent GOLD Choice is that class membership can be predicted from individual characteristics covari ates In other words one can not only identify latent classes clusters or segments that differ with respect to their preferences but it is also possi ble to predict to which unobserved subgroup an individual belongs using covariates Such a profiling of the latent classes substantially increases the practical usefulness of the results and improves out of study prediction of choices Magidson Eagle and
13. one can not only see how many cases are misclassified as indicated by the proportion of classification errors but also detect which are the most common types of misclassifications If a particular entry x x with x 4 2 is large this means that classes x and x are not well separated The marginals of the Classification Table provides the distribution of cases across classes under modal column totals and probabilistic row totals classification Except for very rare situations these marginal distributions will not be equal to one another This illustrates the phenomenon that modal class assignments do not reproduce the estimated latent class distribution Whereas the row totals are in agreement with the estimated classes sizes the column totals provide the latent class distribution that is obtained when writing the class assignments to a file using the Latent GOLD Choice output to file option 4 1 4 Covariate classification statistics These statistics indicate how well one can predict class membership from an individual s covariate values and are therefore only of interest if the estimated model contains active covariates The measures are similar to the ones that are reported in the section Classification Statistics that is the estimated proportion of classification errors the proportional reduc tion of classification errors an entropy based R measure and a qualitative variance based R measure The differenc
14. 1 missing 1 4 1 4 1 4 33 The number 1 4 one divided by the number of categories of the nominal attribute concerned implies that the parameter of the missing value cate gory is equated to the unweighted mean of the parameters of the other four categories Note that the coefficient for the reference category is fixed to 0 Also with dummy last we would get a row with 1 4s for the missing value category 3 3 Prior Distributions The different types of priors have in common that their user defined pa rameters Bayes Constants denoted by a can be interpreted as adding a observations for instance the program default of one generated from a conservative null model as is described below to the data All priors are defined in such a way that if the corresponding a s are set equal to zero log p 9 0 in which case we will obtain ML estimates We could label such priors as non informative Below we present the log p a terms for the various types of distributions without their normalizing constants The symbols U and U P are used to denote the number of different covariate and attribute predictor patterns A particular pattern is referred to by the index u The Dirichlet prior for the latent probabilities equals P Cov Q 1 P COU og p x z a K co 08 2 z Here K denotes the number of latent classes and a the Bayes Constant to be specified by the user As can be seen the influence of the pri
15. 1 to 5 Each of 8 choice sets offer 3 of these 10 possible alternative products to 400 individuals The choice task posed to respondents is to assume that each set represents the actual options available for purchase and to select one of these alternatives from each set with the response none of the above allowed as a fourth choice option The 11 alternatives 10 different products plus a None option are defined in terms of the 3 attributes plus the dummy variable NONE in the Alternatives File cocALT11 sav 124 cbcALT11 sa SPSS Data Editor A ES File Edit View Data Transform Analyze Graphs Utilities Amos4pps Window Help traditional standard traditional standard traditional standard traditional higher traditional higher traditional higher modern standard modem standard modern higher modern higher gt gt 9 9 9 9 9 9 0 09 9 9 4 an EJER Data View iS SPSS Processor is ready YA Figure 14 Alternatives File cbcALT11 sav defining each of the 11 Alternatives The specific 4 alternatives that constitute the 8 choice tasks are defined in the Sets File cocSET sav For example task 1 involves the choice between shoes TS3 MS3 TH2 or None of these Figure 15 Sets File cbcSET sav defining each of the 8 choice tasks cbcSET say SPSS Data Editor iof x File Edit View Data Transform Analyze Graphs Utilities Amos4pps Window Help
16. 2 E repeated for classes 3 4 clu Class modal the class number for the most likely modal class Advanced cfactor CFactor1 cfactor2 CFactor2 cfactor3 CFactor3 gclass1 GClass1 gclass2 GClass2 gcfactorl GCFactor1 Covariate Classification 113 Classification based on covariates as is the case with Standard Classification information can be output to an external file Selecting Covariate Classification from the Output to File section of the ClassPredTab produces the external files The external file corresponding to the Covariate Classification information contains the new variables appended to a copy of the input file used for estimation Advanced For multilevel models this output file also contains the GClass probabilities given group level covariates Prediction Output to a File Predicted Values Predicted values for the dependent variable can be output to an external file The method used to determine the predicted values pred_dep in output file depends on the Predicted Values setting on the Output Tab Posterior Mean HB like or Marginal Mean The predicted value is either the mode choice and ranking or the mean rating Also complete category specific probabilities are provided for each choice set Individual Coefficients It is also possible to output posterior mean estimates for the Individual Coefficients to an external file These are weighted ave
17. 2 or 3 on the variable classind i e no missing values Those coded classind 1 and classind 2 are maintained at their default specifications on the table while the default specification for cases coded classind 3 was changed from class 3 only to any class all 4 class columns checked This specification would be obtained by default if those coded 3 on the classind variable were instead coded as missing In this situation the table would differ from that shown in Figure 9 in that the 3 row of the table would not appear since that category would be coded missing For further information see section 2 5 of the Latent GOLD Technical Guide Technical Tab The default settings for the estimation algorithm are shown below Changes to one or more of these settings can be made in the Technical Tab before or after estimation A description of the estimation algorithm and various options are given in section 3 of Part 2 of this manual 103 Choice Model BrandsAB sav Modell ix Variables Attributes Advanced Model ClassPred Output Technical p Convergence Limit _ Bayes Constants EM Tolerance Latent Variables a Tolerance he 008 Categorical Variables fi a Poisson Counts EM fso Error Variances Newton Raphson 50 Missing Values Exclude Cases Iteration Limits Stat Values Include Indicators Bandom Sets fio Dependent Include A
18. As an option an ID variable can also be appended to the new file For an example using this option see Latent GOLD 4 0 Tutorial 3 Output Filename A default filename will appear in this box Use the browse button to change the filename and or its save location ID Variable A single additional variable may be selected for inclusion typically an ID variable or other key variable which provides a unique identification of each case on the file to allow additional variables on the original data file that were not included in the analysis to be merged onto this file Note The new file is created after the model has been estimated After selecting this option click Estimate to estimate the model and create the new file Warning This setting 1s not preserved across models it must be selected explicitly for each model estimated Advanced For each CFactor and GCFactor the corresponding factor means are output for GClasses the classification probabilities and the modal assignment are output The order of the variables in the output file and labels for a sav formatted file are as follows the Known Class Indicator if specified in the ClassPred Tab any covariates included in the model other model variables specified in the Variables Tab the variable included in the ID box of the ClassPred Tab optional clu 1 Class1 posterior membership probability for class 1 clu 2 Class2 posterior membership probability for class
19. Journal of the Royal Statistical Society Series A 164 339 355 Schafer J L 1997 Analysis of incomplete multivariate data London Chap man amp Hall Skrondal A amp Rabe Hesketh S 2004 Generalized Latent Variable Modeling Multilevel Longitudinal and Structural Equation Models London Chapman amp Hall CRC Skinner C J Holt D and Smith T M F eds 1989 Analysis of Complex Surveys New York Wiley Stroud A H amp Secrest D 1966 Gaussian Quadrature Formulas Englewood Cliffs NJ Prentice Hall 84 Van der Ark L A and Van der Heijden P G M 1998 Graphical display of latent budget and latent class analysis with special reference to corre spondence analysis J Blasius and M Greenacre eds Visualization of categorical data Boston Academic Press Van der Heijden P G M Dessens J and B ckenholt U 1996 Estimating the concomitant variable latent class model with the EM algorithm Journal of Educational and Behavioral Statistics 5 215 229 Van der Heijden P G M Gilula Z and Van der Ark L A 1999 On a relationship between joint correspondence analysis and latent class analy sis M Sobel and M Becker eds Sociological Methodology 1999 81 111 Boston Blackwell Publishers Vermunt J K 1997 Log linear models for event histories Thousand Oakes Series QASS vol 8 Sage Publications Vermunt J K 2002a A general latent class approach for dealing with unob
20. Nedelsky model for multiple choice items In A Van der Ark M A Croon and K Sijtsma eds New Developments in Categorical Data Analysis for the Social and Behavioral Sciences 187 206 Mahwah Erlbaum Bock R D and Aikin M 1981 Marginal maximum likelihood estimation of item parameters Psychometrika 46 443 459 Bockenholt U 2001 Mixed effects analyses of rank ordered data Psychome trika 66 45 62 B ckenholt U 2002 Comparison and choice analyzing discrete preference data by latent class scaling models J A Hagenaars and A L McCutcheon eds Applied latent class analysis 163 182 Cambridge Cambridge Uni versity Press Buse A 1982 The likelihood ratio Wald and Lagrange multiplier tests An expository note The American Statistician 36 153 157 Clogg C C 1981 New developments in latent structure analysis D J Jackson and E F Borgotta eds Factor analysis and measurement in sociological research 215 246 Beverly Hills Sage Publications Clogg C C Rubin D R Schenker N Schultz B Weidman L 1991 Multi ple imputation of industry and occupation codes in census public use sam ples using Bayesian logit regression Journal of the American Statistical Association 86 68 78 81 Cohen S 2003 Maximum difference scaling improved measures of importance and preference for segmentation Proceedings Sawtooth Software Conference 2003 Collins L M Fidler P F Wugalter S E an
21. ProbMeans to list the type of plots produced for a particular model Highlight a plot type to view it in the Contents pane Uni Plot To view the Uni Plot click on the expand contract icon to list the ProbMeans plots and highlight Uni Plot The larger the distance range between points belonging to a particular variable the stronger the variable is related to the latent variable e By default a separate Uni Plot is created for each class Symbols appear in the plots for each value of each attribute and choices corresponding to a selected choice set e Click on any variable symbol in the Uni Plot and the plot label will appear and the status bar will contain a description of the point variable name and category value e Click on any variable name or symbol in the legend and Latent GOLD choice will highlight all the points that refer to that variable To Change Settings for a Uni Plot To change the settings for a Uni Plot right click or select Plot Control from the Model Menu within the Contents pane when a Uni Plot is displayed to open the Plot Contro dialog box To change the font for a plot see Main Menu Options Uni Plot Settings Legend When this option is selected a Legend appears at the bottom of the Uni Plot Point Labels When this option is selected category labels for each variable are listed on the Uni Plot next to the variable symbol Classes Select which Classes to include in the Uni Plots For each cl
22. The Maximization M step involves finding new Y improving log L Note that actually we use PM rather than ML estimation which means that in the M step we update the parameters in such a way that log P log L log p 0 11 increases rather than 10 Sometimes closed form solutions are available in the M step In other cases standard iterative methods can be used to improve the complete data log posterior defined in equation 11 Latent GOLD Choice uses iterative proportional fitting IPF and unidimensional Newton in the M step see Vermunt 1997 Appendices Besides the EM algorithm we also use a Newton Raphson NR method 7 In this general optimization algorithm the parameters are updated as follows A ee o0 0 e Hg The gradient vector g contains the first order derivatives of the log posterior Au 1 to all parameters evaluated at Y H is the Hessian matrix containing the second order derivatives to all parameters and e is a scalar denoting the step size Element g of g equals 1 log Plyi zi 0 Ologp 9 Haberman 1988 proposed estimating standard LC models by Newton Raphson 36 and element Hi of H equals 1 A log Plyi zi 9 0 logp 0 Hew Y 0900 y 00 i 1 Latent GOLD Choice computes these derivatives analytically The step size e 0 lt e lt 1 is needed to prevent decreases of the log posterior to occur More precisely when a standard NR update H g yields a decrease of
23. Vermunt 2003 Natter and Feurstein 2002 Vermunt and Magidson 2002 The next section describes the LC models implemented in Latent GOLD Choice Then attention is paid to estimation procedures and the correspond ing technical options of the program The output provided by the program is described in the last section Several tutorials are available to get you up and running quickly These include e cbcRESP sav a simulated choice experiment Tutorial 1 Using LG Choice to Estimate Discrete Choice Models Tutorial 2 Using LG Choice to Predict Future Choices e brandABresp sav a simulated brand price choice experiment Tutorial 3 Estimating Brand and Price Effects Tutorial 4 Using the 1 file Format e bank45 sav amp bank9 1 file sav real data from a bank segmentation study Tutorial 5 Analyzing Ranking Data Tutorial 6 Using LG Choice to Estimate max diff best worst and Other Partial Ranking Models e conjoint sav ratingRSP sav ratingALT sav ratingSET sav simulated data utilizing a 5 point ratings scale Tutorial 7 LC Segmentation with Ratings based Conjoint Data Tutorial 7A LC Segmentation with Ratings based Conjoint Data All of the above tutorials are available on our website at http www statisticalinnovations com products choice html tutorialslink 10 2 The Latent Class Model for Choice Data In order to be able to describe the models of interest we first must clarify some concepts an
24. Wide Me Suicida 3 Part 2 Technical Guide for Latent GOLD Choice Basic and Advanced 5 Introductions aaan tn hie es SS 8 The Latent Class Model for Choice Data ooooccccccncnnncccnnnnccococnnnonnconnnnnnccr cn nnnnnccnnnnnons 11 Estimation and Other Technical Issues c ccc ccccceeseeeceeeeeeeeeceuseeeeeeeusueeeeeeuueeeeeeeennes 30 The Latent GOLD Choic Output risor tenets axe ae dns alii airada 43 Introduction to Advanced Models ccccccecce ccc eeeeeeeeeeeeeeeeeeeeueueseeeueeeeeeeeeeeeeeeeeennes 59 Continuous FactOrs ccccccc cece ese e ee enceeeceeneeeeceeceueeeceuuseeeeesnseeeeeeeeugeueeeuugeueeeseeunees 60 Multilevel LC Choice Model cece ccc cece cece ees eeeeeeeeeeeeeeeeeeseeeeeeeeeeeeegsueeeeeeueeeeseeenes 65 Complex Survey Sampling ierre npara secs ed daied EEEE dial PROTESER EE EE caian 72 Latent GOLD Choice s Advanced Output e ce cecee sence cence e eee eeneee tent eee eaeneneeaenenees 76 Bibliography ss sisessye siete ii a ia 80 Nota E a cai Diana 86 Part 3 Using Latent GOLD Choice ccsssssssssssssssssssssssssssssssssssssssssssees OF RIR OV ORVICW E EOE EE O OEEO EEE EAS D A 2 0 INCEOdUCION iO Step 15 Model ES ELUP ssssvsssasscnssvassunsescvssconsioosesveasanasauasosodoussaresosesvensenossuasososesedece A Model Analysis Dialog Box Va di Attributes rd onto La a o eth Advance lr de Lobo bos bos hos o al snablestabebs eit beliect Modyo
25. a parsimonious specification of the Class dependence of the constants Let One be an attribute with the constant value 1 The model of interest is obtained with the restrictions 3It is not necessary to assume that the 100 appears in all alternatives for the brand concerned The value could also be 100 if a particular condition is fulfilled for example if the price of the evaluated product is larger than a certain amount and 0 otherwise This shows that the offset option provides a much more flexible way of specifying classes with zero response probabilities than the zero inflated option 4While a value of 100 for the offset can be used to fix a probability to 1 0000 a value of 100 can be use to fix a probability to 0 0000 For example exp 100 exp 100 exp 0 exp 0 0 0000 28 table Class 1 Class 2 Class 3 Class 4 Constants 1 1 1 1 One 2 3 4 Attributel 1 2 3 4 Attribute2 1 2 3 4 Instead of having a separate set of constants for each latent class the re stricted constant for category m in Class x equals 360 8er ys pat where for identification 8 0 Note that this is equivalent to using the no simple setting for the constants and similar to the treatment of ordinal indicators in the LC Cluster and DFactor Modules of Latent GOLD Suppose you assume that the effect of price is negative descending for Classes 1 3 and unrestricted for Class 4 This can be accomplished by having two co
26. be based on covariates only This involves using the model probabilities P zlz sometimes referred to as prior probabilities as classification probabilities for each covariate pattern u The same modal classification rule can be applied as with the posterior class membership probabilities 4 8 Output to file Options Five types of items can be written to output files classification classification based on covariates predicted values individual specific coefficients and the estimated variance covariance matrix of the model parameters 57 With Standard Classification and Covariate Classification the output file will contain the posterior class membership probabilities Plelz yi and the model probabilities P alz respectively as well as the modal Class assign ment based on these probabilities With the option Predicted Values to a file one obtains the estimated individual specific choice probabilities Pai which depending on the type of prediction are defined by equation 17 or 18 as well as the predicted value which is a mode with choices and rankings and a mean with ratings In addition a CHAID chd input file can be created for further profiling of the latent classes see Section 4 9 With Individual Coefficients one obtains the estimated individual specific regression coefficients Let PB denote the estimated value of one of the conditional logit parameters which can be a constant an attribute effect or a p
27. can be treated as a sequential choice process The selection of the best option is equivalent to a first choice The selection of the worst alternative is a first choice out of the remaining alternatives where the choice probabilities are negatively related 16 to the utilities of these alternatives By declaring the dependent variable to be a ranking the program automatically eliminates the best alternative from the set available for the second choice The fact that the second choice is not the second best but the worst can be indicated by means of a replication scale factor of 1 which will reverse the choice probabilities More precisely for the worst choice exp 1 Nmj z zi P yie ma 28 22 sy ha Yit it olit 9 it meds exp 1 Nme zi if m Aj and 0 if m Aj The second noteworthy application of the scale factor occurs in the si multaneous analysis of stated and revealed preferences Note that use of a scale factor larger than 0 but smaller than 1 causes sj mjz z to be shrunk compared tO Mmjz z and as a result the choice probabilities become more similar across alternatives A well documented phenomenon is that stated preferences collected via questionnaires yield more extreme choice probabil ities than revealed preferences actual choices even if these utilities are the same Louviere et al 2000 A method to transform the utilities for these two data types to the same scale is to use a somewh
28. class in which all coefficient are zero This is specified as follows Class 1 Class 2 Class 3 Class 4 Constants 2 3 4 Attributel 2 3 4 Attribute2 2 3 4 where indicates that the effect is equal to 0 In this example Class 1 is the random responder class 26 Merge Effects is a much more flexible variant of Class Independent It can be used to equate the parameters for any set of latent classes Besides post hoc constraints very sophisticated a priori constraints can be imposed with this option An important application is the specification of LC DFactor structures in which each latent class corresponds to the categories of two or more latent variables For example consider a set of constraints of the form Class 1 Class 2 Class 3 Class 4 Constants 1 1 3 3 Attributel 1 2 1 2 E Attribute2 1 2 1 2 where the same numbers in a given row indicate that the associated class pa rameters are equal This restricted 4 Class model is a 2 dimensional DFactor model the categories of DFactor 1 differ with respect to the constants and the categories of DFactor 2 with respect to the two attribute effects Specifi cally level 1 of DFactor 1 is formed by Classes 1 and 2 and level 2 by Classes 3 and 4 level 1 of DFactor 2 is formed by Classes 1 and 3 and level 2 by Classes 2 and 4 The option Offset can be used to specify any nonzero fixed value con straint on the Class specific effect of a numeric att
29. click on Model to bring up the 1 class model In the Classes box type 1 4 in place of 1 to request estimation of 4 models between and 4 classes Skip ahead to Estimating Models p 11 VVVV Alternatively to go through the setup steps one at a time gt from the menus choose File Open gt From the Files of type drop down list select SPSS system files sav if this is not already the default listing All files with the sav extensions appear in the list gt Select cbcRESP sav and click Open gt Right click on Modell and select Choice 128 Choice Model CbcRESP sav Model l x Variables Attributes Advanced Model ClassPred Output Technical Predictors gt fe 3File C 1File Covariates gt Classes 1 Replication Scale gt Replication weight gt Figure 18 Model Analysis Dialog Box gt Select each variable and move it to the appropriate box by clicking the buttons to the left of these boxes gt ID to Case ID gt SEX AGE to Covariates gt SET to Choice Set gt CHOICE to Dependent gt In the Classes box type 1 4 in place of 1 to request estimation of 4 models between 1 and 4 classes gt Right click on SEX and AGE and select Nominal to change the scale type for these covariates Your Choice model setup should now look like this 129 xi Variables Attributes Advanced Model ClassPred Output Technical Dependent gt
30. descriptive measure that is defined as follows EE Ine mel N EE me 2N DI It should be noted that the term N NL_ mj captures the contribution of the zero observed cells to DI This term is added to the formula because ELL ne M is a sum over the non zero observed cell counts only DI is a descriptive measure indicating how much observed and estimated cell frequencies differ from one another It indicates which proportion of the sample should be moved to another cell to get a perfect fit 4 1 2 Log likelihood statistics The program also reports the values of the log likelihood log the log prior log p 9 and log posterior log P Recall that I log gt wi log Plyi zi i 1 logP log log p 46 In addition the Bayesian Information Criterion BIC the Akaike Informa tion Criterion AIC the Akaike Information Criterion 3 41C3 and the Consistent Akaike Information Criterion CAIC based on the log likelihood are reported These are defined as BlCoge 2log L log N npar AlCiog 2log 2 npar AIC3 2log L 3 npar CAI Coge 2log L log N 1 npar If the Bootstrap 2LL diff option is used the program also provides the estimated bootstrap p value and the standard error for the 2LL difference test between a restricted and an unrestricted model 4 1 3 Classification statistics This set of statistics contains information on
31. ee we ewe es 4 7 Classification Information 0 000880 18 Output to fle Options sra rrasa raadi wR SS 4 9 The CHAID Output Option Introduction to Part 11 Advanced Models Continuous Factors 6 1 Model Components and Estimation Issues 6 2 Application Types lt o eocoomomoresa esos 6 2 1 Random effects conditional logit models 6 2 2 LC FM regression models with random effects Multilevel LC Choice Model 7 1 Model Components and Estimation Issues 7 2 Application Types 6 66 e ss eee wee ee 7 2 1 Two level LC Choice model 7 2 2 LC discrete choice models for three level data 7 2 3 Three level random coefficients conditional logit models 7 2 4 LC growth models for multiple response 7 2 5 Two step IRT applications 7 2 6 Non multilevel models Complex Survey Sampling 8 1 Pseudo ML Estimation and Linearization Estimator 8 2 A Two step Method 0 0000 2c eee Latent Gold Choice s Advanced Output DL Model Summary eo ccoo a eee He ee eee o o oe te RS ee Ee oe A a Di GProlle o a ener oe BS os Bo dew Ew Gee ee es 71 94 ProbMeans oo o sa lt 6 445 6 ee eR a a eee 79 9 5 Frequencies o cea ee ee AR 79 90 Classification o oe oe s ee oe e e e wee 79 9 7 Output to file Options oaoa a a 79 10 Bibliography 81 11 Notation 87 11 1 Basic Models 2 4 66 58 lt lt
32. exclude some of the CFactor terms and when combined with latent classes one can use the standard between class parameter constraints By default the CFactors box is set to None To include CFactors in a model click to open the drop down menu and select the number of CFactors to include in the model 1 2 or 3 When 1 or more CFactors are included in the model they appear on the Model Tab for further model specification By default CFactors use 10 nodes to approximate normally distributed variables To improve precision of the estimates the number of nodes may be increased to a value as high as 50 or reduced as low as 2 This change is made in the Continuous Factors section of the Technical Tab Warning Inclusion of CFactors in a model may substantially increase the amount of time required to estimate the model For example inclusion of 2 CFactors results in 10x10 100 nodes used to 98 approximate the bivariate normal distribution for these CFactors Increasing the number of nodes to 50 results in 50x50 2500 nodes which will substantially increase the amount of estimation time For further details see Section 6 of the Technical Guide When used with a 1 class model the result is not a LC model but a standard random coefficient conditional logit model For further details regarding the various kinds of applications with CFactors see Section 6 of the Technical Guide Multilevel Model This advanced option is used to
33. id 2 alternative id where setid is the set generally a sequential index number 1 2 3 and the jth alternative id corresponds to a valid alternative identification number given in the alternatives file It should be noted that choice sets may have different numbers of alternatives in which case some of the alternatives need to be defined as missing Moreover the sets file may also contain sets that are not presented to the respondents 91 inactive sets for which one will also obtain estimates of the class specific and overall choice probabilities After you have opened your Response File you are ready to specify the model settings in the Variables Tab Bring up the Model Analysis Dialog Box by double clicking on the model name in the Outline Pane Then set up your model by specifying the name of the dependent variable which contains the respondent choices the case ID set ID any covariates the attribute effects to be included in the model and any technical settings This process is illustrated in tutorial 1 Model Analysis Dialog Box Choice Model BrandsAB sav Modell x Variables Attributes Advanced Model ClassPred Output Technical Dependent gt CHOICE Choice CASE_ID Case ID gt Choice Set gt SET_ID ul Predictors gt 3File C 1File Covariates Meelis Num Fixed Classes 1 x Replication Scale gt T Lexical Order Replication We
34. method to deal with sampling weights This is a two step procedure in which the model is first estimated without making use of the sampling weights and in which sub sequently the latent class sizes and covariate effects are corrected using the sampling weights 73 8 1 Pseudo ML Estimation and Linearization Estima tor The survey option can be used to take into account the fact that cases may 1 belong to the same stratum 2 belong to the same primary sampling unit PSU often referred to as a sampling cluster 3 contain a sampling weight 4 be sampled from a finite population Let o denote a particular stratum c a particular PSU in stratum o and i a particular case in PSU c of stratum o Moreover let O be the number of strata C the number of PSUs in stratum o and Ise the number of cases in PSU c of stratum o The sampling weight corresponding to case 7 belonging to PSU c of stratum o is denoted by sWoci and the population size total number of PSUs of stratum o by No From this notation it can be seen that PSUs are nested within strata and that cases are nested within PSUs In other words records with the same Case ID should belong to the same PSU and all records with the same PSU identifier should belong to the same stratum The population size N indicates the population number of PSUs in stratum o and should thus have the same value across records belonging to the same stratum Another thing that should be no
35. models as in all other Latent GOLD Choice models the number of cases serves as N sample size in the computation of the BIC and CAIC values that appear in the Log likelihood Statistics An alternative would have been to assume N to be equal to the number of groups instead of the number of cases Users who prefer this alternative definition of BIC and CAIC may compute these statistics themselves The Classification Statistics contain information on how well one can predict an individual s CFactor scores and a group s GClass membership and GCFactor scores For GClasses one obtains the same information as for the latent classes proportion of classification errors and three R measures For CFactors and GCFactors one obtains only the standard R which can TT be interpreted as a reliability measure In multilevel models with covariates Covariate Classification Statistics will contain information for the GClasses The Prediction Statistics are the same as in models without CFactors GClasses and GCFactors The R measures indicates how well a model y predicts the choices given all predictors covariates and latent variables 9 2 Parameters This section reports the parameters corresponding to CFactors GClasses and GCFactors CFactors GClasses and GCFactors effects may appear in the Model for Choices Rankings Ratings In multilevel models GClasses and GCFactors may be used in the Model for Classes When GClasses affect a particu
36. ncon att att pre pre Nm x Zit Ngem a Ym bzo itp Y xq a i p 1 q 1 The attribute and predictor effects are multiplied by the fixed category score y to obtain the systematic part of the utility of rating m As can be seen there is no longer a fundamental difference between attributes and predictors since attribute values and predictor effects no longer depend on m For ratings Mmjw z 18 defined by substituting yf Gp in place of the category specific attribute values Zimp in equation 2 and the category specific predictor effects Ofr are replaced by Yo va The relationship between the category specific utilities Nx and the response probabilities is the same as in the model for first choices see equation 1 As mentioned above in most situations the category scores y are equally spaced with a mutual distance of one In such cases y y _ 1 and as result att Pre o PlYu M z Zit Zi n 7 att Pre mjx zi Im 1 T Zit PY m lx zg Zit 15 _ con _ con am a m 1 P Q att att pre _pre F 5 Pip Zitp 5 xq Zitq E p 1 q 1 This equation clearly shows the underlying idea behind the adjacent category logit model The logit in favor of rating m instead of m 1 has the form of a standard binary logit model with an intercept equal to Bim bt m 1 and slopes equal to and f The constraint implied by the adjacent category ordinal logit model is th
37. of the Class specific response probabilities where the posterior class membership probabilities serve as weights There are two other prediction methods HB like and marginal mean prediction In the first one obtains Polit with the individual specific utilities mito exP Mrjir Em 1 XP Aris The Vit are weighted averages of the Class specific utilities defined in equa tion 2 where the posterior class membership probabilities serve as weights that is asp 18 mit K ral 5 P x z yi Trax x 1 Because of the similarity with prediction in Hierarchical Bayes HB proce dures we call this alternative method HB like prediction Note that the way we compute mji 18 equivalent to computing mi with the individual specific Ban parameters defined in equation 20 Marginal mean mode prediction differs from posterior mean prediction in that the prior class membership probabilities P 2 z are used in the formula for Polit given in equation 17 instead of the posterior member ship probabilities P alz yi Whereas posterior mean and HB like predic tion provide a good indication of the within sample prediction performance 50 marginal mean prediction gives a good indication of the out of sample pre diction performance The most natural predicted value for a categorical dependent variable is the mode that is the m with the largest Pro The Prediction Table cross classifies observed and predicted values based
38. option both the specified number of starting sets and iterations per set are reduced by a factor of three In one class models in which local maxima cannot occur the number of starting sets is automatically equated to 1 With the option Tolerance one can specify the EM convergence criterion to be used within the random start values procedure Thus start values 39 iterations stop if either this tolerance or the maximum number of iterations is reached 3 7 Bootstrapping the P Value of L or 2LL Difference Rather than relying on the asymptotic p value it also possible to estimate the p value associated with the L statistic by means of a parametric bootstrap This option is especially useful with sparse tables Langeheine Pannekoek and Van de Pol 1996 and with models containing order restrictions Galindo and Vermunt 2005 Vermunt 1999 2001 The model of interest is then not only estimated for the sample under investigation but also for B replication samples These are generated from the probability distribution defined by the ML estimates The estimated bootstrap p value Pboot is defined as the proportion of bootstrap samples with a larger L than the original sample nas 00 t 00 raa ai The standard error of Pboot equals y N The precision of Pboot can be increased by increasing the number of replications B The number of replications is specified by the parameter Replications A similar procedure is used to obtain
39. served heterogeneity in the analysis of event history data J A Hagenaars and A L McCutcheon eds Applied latent class analysis 383 407 Cam bridge Cambridge University Press Vermunt J K 2002b Comments on Latent class analysis of complex sample survey data Journal of the American Statistical Association 97 736 737 Vermunt J K 2002c An Expectation Maximization algorithm for generalised linear three level models Multilevel Modelling Newsletter 14 3 10 Vermunt J K 2003 Multilevel latent class models Sociological Methodology 33 213 239 Vermunt J K 2004 An EM algorithm for the estimation of parametric and nonparametric hierarchical nonlinear models Statistica Neerlandica 58 220 233 Vermunt J K 2005 Mixed effects logistic regression models for indirectly ob served outcome variables Multivariate Behavioral Research in press Vermunt J K and Magidson J 2000 Latent GOLD User s Manual Boston Statistical Innovations Vermunt J K and Magidson J 2001 Latent Class Analysis with Sampling Weights Paper presented at the 6th annual meeting of the Methodology Section of the American Sociological Association University of Minnesota May 4 5 2001 Vermunt J K and Magidson J 2002 Latent Class Models for Classification Computational Statistics and Data Analysis 41 531 537 85 Vermunt J K and Magidson J 2003 Nonparametric random coefficients mod els M Lewis Beck A
40. shown in the Contents Pane Plot Font Allows you to customize the plot font for the output plots in the Content Pane Upon making a change to the font this change goes into effect the next time the plot is opened in the Contents Pane Text Style Allows you to customize the text style for the output in the Contents Pane Upon making a change to the text style this change goes into effect the next time the output listing is opened in the Contents Pane Format Allows you to change the format General Fixed and Scientific and number of digits for numeric values displayed in the output View 121 The options available in the View Menu change depending upon what is highlighted in the Outline Pane For example when a model name is highlighted the options are Toolbar Shows the shortcuts Toolbar Status Bar Shows the status bar The status bar displays various information as the model is being estimated ProbChi Opens the ProbChi calculator This calculator can be used to obtain a p value for a given chi square or vice versa for a specified number of degrees of freedom df When an interactive table or plot appears in the Contents pane the View Menu lists the various options for changing the appearance of the associated output Model The Model Menu options are organized into 3 sections The first section contains the options for specifying the type of model to be estimated Cluster Specifies a Cluster model to be estimat
41. specify a multilevel extension to an LC Choice Model which allows for explanation of the heterogeneity not only at the case level but also at the group level Heterogeneity at the group level is explained by the inclusion of group level classes GClasses and or group level CFactors GCFactors in a model Group ID The Group ID variable indicates to which higher level unit or group each case belongs Upon selecting a variable as the Group ID the Group Specification Box in the lower right portion of the Advanced Tab is activated GClasses This option assumes that groups belong to one of a set of latent classes of groups the number of which is specified with GClasses Group level Classes This yields the nonparametric variant of the multilevel LC model By default the GClass box is set to 1 To use this option specify 2 or more GClasses Click the up arrow in the drop down box to increase the number of GClasses to 2 or more up to 100 The GCFactors then appear in the Group Specification Box below GCFactors This option assumes that groups differ with respect to their scores on one or more group level continuous factors GCFactors or group level random effects This yields the parametric variant of the multilevel LC model Click on the drop down box to select the number of GCFactors The GCFactors then appear in the Group Specification Box below GClasses and GCFactors may both be specified to combine the parametric and nonparametri
42. standard LC and latent budget models A nice feature of the Profile and ProbMeans output is that it describes the relationships between the latent variable and all variables selected as attributes or covariates This means that even if a certain covariate effect is fixed to zero one still obtains its ProbMeans information This feature is exploited in the inactive covariates method Advantages of working with inactive instead of active covariates are that the estimation time is not increased and that the obtained solution is the same as without covariates 4 5 Set Profile and Set ProbMeans The Set Profile and Set ProbMeans output sections contain information on the estimated choice probabilities per choice set For rankings these are based on the first choice replications only For choices and ratings all repli cations are used Let Z denote a particular choice set number as indicated by the Set ID variable The Class specific and the overall choice probabilities for Set are obtain as follows I as dD d PEN Wei ice vaP Yi T miz ze Zin T X Y i 1 Wai Lites Vit 59 P m x p E P m iar Wi tes Vit Pipi l D Wi rel Vit Here W is the case weight times the posterior membership probability see equation 9 and P nit is the individual specific choice probability which depending on the type of prediction is defined by equation 17 or 18 The computation of the Set Average is the same except that
43. the log likelihood the step size is reduced till this no longer occurs The matrix H evaluated at the final Y yields the standard esti mate for the asymptotic variance covariance matrix of the model parameters S etandara 0 H 8 Latent GOLD Choice also implements two alterna tive estimates for 0 The first alternative is based on the outer product of the cases contributions to the gradient vectors that is Smee B7 where element B of B is defined as I Bw Ev A AA k ki Note that B is the sample covariance matrix of the case specific contributions to the elements of the gradient vector The third estimator for 2 9 is the so called robust sandwich or Huber White estimator which is defined as Y robust 9 H B H The advantage of Souter 0 compared to the other two is that is much faster to compute because it uses only first derivatives It may thus be an alterna tive for Sena in large models The advantage of the robust method is that contrary to the other two methods it does not rely on the assumption that the model is correct Note that 0 can be used to obtain the standard error for any function h 9 of Y by the delta method A r pa iii MOV a m8 RD EO SS 12 00 00 The matrix H is usually referred to as the observed information matrix which serves as an approximation of the expected information matrix 37 Latent GOLD Choice uses the delta method for exa
44. the numerical integration The default value is 10 the minimum 2 and the maximum 50 6 2 Application Types 6 2 1 Random effects conditional logit models An important application of the CFactor option involves random effects discrete choice modeling McFadden and Train 2000 Skrondal and Rabe Hesketh 2004 8 Let us first look at the random intercept case in a model for first choices containing M 1 alternative specific constants and P attributes Such a model has the following form P acon att att con Mmlzi Fri Sn Y P Zitmp Ami Fii 23 p 1 Note that a single CFactor is used to capture the variation in each of the M 1 constants a specification that is also used in the random effects multinomial logistic regression model proposed by Hedeker 2003 The random part of the alternative specific constant corresponding to category m is denoted as wor Its variance equals Tycon and the covariance between 10 and Wi equals Oyeon weon Amt AM The model can be expanded to include random slopes or random coeffi cients in addition to random intercept terms However a slight complication is that one has to decide whether the various random effects should be un correlated or not For uncorrelated random effects expanding the model of equation 23 with a random slope for the first attribute yields P Rcon att att con att att Mmj zi F Br 5 Bp Zitmp Ami Fi AT Foi Zmitl p 1 The var
45. treated as partial rankings with negative scale factors for the second worst choice and the analysis of constant sum data involves the use of replications weights A format that has not been discussed explicitly is paired comparisons Dillon and Kumar 1994 Paired comparisons are however just first choices out of sets consisting of two alternatives and can therefore be analyzed in the same way as first choices Another format mentioned in the introduction is binary rating Such a binary outcome variable concerning the evaluation of a single alternative yes no like dislike can simply be treat as a rating The 18 most natural scoring of the categories would be to use score 1 for the positive response and 0 for the negative response which yields a standard binary logit model The pick any out of M format can be treated in the same manner as binary ratings that is as a set of binary variables indicating whether the various alternatives are picked or not Another format called joint choices occurs if a combination of two or more outcome variables are modelled jointly Suppose the task is to give the two best alternatives out of a set of M which is a pick 2 out of M format This can be seen as a single choice with M M 1 2 joint alternatives The attribute values of these joint alternatives are obtained by summing the attribute values of the original pair of alternatives Other examples of joint choices are non sequential models f
46. when df is much larger than the total sample size N The program reports the Bayesian Information Criterion BIC the Akaike Information Criterion AIC Akaike Information Criterion 3 A C3 and the Consistent Akaike Information Criterion CAIC based on the L and df which is the more common formulation in the analysis of frequency 12Tn order to get meaningful chi squared statistics in models with a known class indi cator we in addition divide by qm Tira P x z 13 Note that we are using a somewhat unconventional formula for X The reason for this is that the sum aa _ is over the nonzero observed cells only 45 tables They are defined as BIC I log N df AICre L 2 df AIC3 LP 3 df CAIC L log N 1 df These information criteria weight the fit and the parsimony of a model the lower BIC AIC AIC3 or CAIC the better the model Use of information criteria based on L or log see below should yield the same result The differences between BIC AIC AIC3 and CAIC val ues across models are the same with both methods However with extremely large df the L based information measures may become more highly neg ative than the maximum precision can indicate which makes their rounded values meaningless In such cases one has to use the equivalent log based measures The last statistic that is provided in the chi squared statistics section is the Dissimilarity Index DI which is a
47. 12 records Names for each of the models appear in the left hand or Outline Pane Once a model is estimated it s results appear in the right hand or Contents Pane The Outline Pane is used to select different model views and summaries while the Contents Pane contains the actual views and summary output The 3 file format consists of an alternatives file which defines each alternative in terms of one or more attributes a choice sets file containing one or more choice sets each presenting a set of alternatives among which the choice s are made a response file indicating the choices made by each respondent to one or more choice sets Response file required The response file contains the choices made by each respondent to one or more choice sets defined in an associated sets file Optionally the response file may also contain respondent characteristics age gender or choice set characteristics for inclusion in the model as covariates or predictors Generally the Response file will also contain a case ID and sets ID variables In the special case that no attributes are included in the model alternative specific constants alphas must be included to estimate a minimal model which may also contain predictors and or covariates In this special case of no attributes for a dependent defined as choice this minimal model reduces to the traditional latent class MNL model and the results of estimation will be equivalent to a correspo
48. 4e 18 59 2647 14e 13 y gt LA Figure 36 New Parameters Output Notice that the within class estimates for PRICE are close to each other The formal test of equality is given by the Wald statistic for which the p value 0 17 is not significant Thus we will restrict the effect of PRICE to be equal across all 3 segments To view whether any of the parameter estimates are non significant gt Right click on the Contents Pane and select Z Statistic from the Popup menu The Z statistics show that for class 1 the effect of higher quality and for class 2 the effect of modern fashion are not significant To implement these parameter restrictions gt Double click on the last model Model5 here to re open the model setup screen gt Click on Model to Open the Model tab gt Right click under column 1 in the QUALITY row and select No Effect from the pop up menu to restrict the class 1 effect for QUALITY to zero gt Right click under column 2 in the FASHION row and select No Effect from the pop up menu to restrict the class 1 effect for FASHION to zero gt Right click under Class Independent column in the PRICE row and select Yes to restrict the effect of PRICE to be class independent 142 Choice Model CbcRESP sav Model5 Variables Attributes Advanced Model ClassPred Output Technical Order Se Figure 37 Implementing Parameter Restrictions gt Click Estimate to re estima
49. 7 be a vector of 0 1 variables containing the Known Class infor mation for case 2 where Tiy 0 if it is known that case i does not belong to class x and Ti 1 otherwise The vector 7 modifies the model with covariates defined in equation 4 as follows s P y Z Ti Tis P z Plis E Za a 1 ot ll As aresult of this modification the posterior probability of belonging to class x will be equal to 0 if Tis 0 The known class option has three important applications 1 It can be used to estimate models with training cases that is cases for which class membership has been determined using a gold standard method Depending on how this training information is obtained the missing data mechanism will be MCAR Missing Completely At Ran dom where the known class group is a random sample from all cases MAR Missing At Random where the known class group is a random sample given observed responses and covariate values or NMAR Not 23 Missing At Random where the known class group is a non random sample and thus may depend on class membership itself MAR oc curs for example in clinical applications in which cases with more than a certain number of symptoms are subjected to further examination to obtain a perfect classification diagnosis NMAR may for example occur if training cases that do not belong to the original sample under investigation are added to the data file Both in the MAR and MCAR situat
50. 7021 1 1075 268 3153 7 1e 58 PRICE 0 2793 0 2775 0 7913 12 1456 0 0069 0 0162 0 0127 0 0358 0 9377 0 82 1 4413 0 7447 16 6131 0 00085 gt 0 2907 Figure 35 Parameters Output for 3 class Model Using Restrictions to Refine the 3 Class Model Under the 3 class model the estimates for PRICESQ are no longer significant which reflects the true structure of this model The associated p value is greater than 0 05 p 82 Double click on 3 class to re open the model setup window Click Attributes to open the attributes tab Select PRICESQ and click the Attributes button to remove this effect from the model Click Estimate to re estimate the model After the estimation has completed click Parameters to view the new parameters estimates VVVVV 141 LatentGOLD ioj x File Edit View Model Window Help Sal e sia gt r Model for Choices a Class1 Class2 Class3 Overall Standard Errors R 01991 0 2975 0 0405 0 2156 Z Statistic R 0 0 2364 0 3089 0 0479 0 2230 Std Ers amp Z v Wald Statistics Attributes Class1 Class2 Class3 Wald p value Wald p value FASHION 3 0250 01719 14969 494 7388 6 6e 107 216 3667 1 0e 47 QUALITY 0 0883 27180 11153 277 9636 5 8e 60 171 1602 6 8e 38 PRICE 0 3875 0 3583 0 5578 144 4795 41e 31 3 5002 PRES HOHE 1 2905 0 1867 0 4330 62 3912 9
51. 9 Application of this method in the context of FM and LC models was proposed by Vermunt 2002b and Wedel Ter Hofstede and Steenkamp 1998 The overall struc ture of A is similar to the robust or sandwich estimator td discussed earlier that is a Ysurvey V H B H As can be seen a matrix B is sandwiched between the inverse of the Hessian matrix For the computation of B one needs two components the contribution of PSU c in stratum o to the gradient of parameter k denoted by Jock and its sample mean in stratum o denoted by g These are obtained as follows I oe O log Pla Pigs Y ock SWoci and a J Z1 Jock ok En Using these two components element Bgy of B can be defined as Y GA G Don Bu One Ba Bry 1 Jock 5 Gok Jock Dok o 1 Co 1 No c 1 75 Note that if we neglect the finite population correction factor 1 ar B is the sample covariance matrix of the PSU specific contributions to the gradient vector Various observations can be made from the formula for Bk The first is that without complex sampling features one stratum single case per PSU no sampling weights and de 0 the above procedure yields Simi which shows that S survey 0 not only takes into account the sampling design but is also a robust estimator of 2 9 Second the fact that gradient contributions are aggregated for cases belonging to the same PSU shows that the PSUs are treated as the
52. Bryman and T F Liao eds Encyclopedia of Re search Methods for the Social Sciences NewBury Park Sage Publications Inc Vermunt J K and Magidson J 2005 Hierarchical mixture models for nested data structures C Weihs and W Gaul eds Classification The Ubiqui tous Challenge in press Heidelberg Springer Vermunt J K and Van Dijk L 2001 A nonparametric random coefficients ap proach the latent class regression model Multilevel Modelling Newsletter 13 6 13 Wedel M and DeSarbo W S 1994 A review of recent developments in latent class regression models R P Bagozzi ed Advanced methods of Marketing Research 352 388 Cambridge Blackwell Publishers Wedel M and DeSarbo W S 2002 J A Hagenaars and A L McCutcheon eds Applied latent class analysis 366 382 Cambridge Cambridge Uni versity Press Wedel M Ter Hofstede F and Steenkamp J B E M 1998 Mixture model analysis of complex samples Journal of Classification 15 225 244 Westers P and H Kelderman 1991 Examining differential item functioning due to item difficulty and alternative attractiveness Psychometrika 57 107 118 Westers P and H Kelderman 1993 Generalizations of the Solution error Response error Model Research Report 93 1 Faculty of Educational Science and Technology University of Twente 86 11 Notation 11 1 P a I t T Yit m Ym i I Basic Models probability case in
53. CHOICE 4 0 USER S GUIDE Jeroen K Vermunt Jay Magidson ns Thinking outside the brackets For more information about Statistical Innovations Inc please visit http www statisticalinnovations com or contact us at Statistical Innovations Inc 375 Concord Avenue Suite 007 Belmont MA 02478 e mail will statisticalinnovations com Latent GOLD Choice is a trademark of Statistical Innovations Inc Windows is a trademark of Microsoft Corporation SPSS is a trademark of SPSS Inc our website at Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective companies Latent GOLD Choice 4 0 User s Manual Copyright 2005 by Statistical Innovations Inc All rights reserved No part of this publication may be reproduced or transmitted in any form or by any means electronic mechanical photocopying recording or otherwise without the prior written permission from Statistical Innovations Inc 1 30 06 TABLE OF CONTENTS Manual for Latent GOLD ChOiICC ccccccccscssscscssscscssscssccsssssssccsccscscees L Structure Or this anal A o BRE 1 Part ls A A A AAA Latent GOLD Choice 4 0 Advanced 0 ccccescssessesessesessesescesessesessesesecseseeseeeseeaceeeseseeaeeecaeeecseeeeseeaeceeaeeeeaees 1 Optional Add ons to Latent GOLD Choice 4 0 0 ssesesseseesecesssesseeeesscseseeecscseseeecscsceeseescacssesacaeeeeeeesas 2 Ackno
54. GClasses play a role similar to the one of the Classes in the discrete choice model effects can be GClass independent check off or GClass dependent check on GCFactors play a role similar to CFactors in a random effects choice model effects can be assumed to be fixed check off or random check on See Section 7 of the Technical Guide for details and application types of multilevel models Model Tab Choice Model brandsAB sav 1 class Figure 8 Model Tab Use this tab to specify different kinds of restrictions These restrictions are available for alternative specific constants attribute effects and predictor effects These terms can be set equal to zero for certain classes No Effect equated across two or more classes Merge Effects and equated across all classes Class 100 Independent The effect of an attribute or a predictor can be restricted to be ordered in each class Ascending or Descending These constraints are activated by means of a pop up menu that appear following a right click on the desired cells in the grid For example a right click on one or more selected cells in the first set of columns corresponding to the classes brings up the menu Merge Effects Separate Effects Offset Right clicking on one or more cells under the column marked Class Independence brings up the following menu No To impose the restriction of class independence select one or more cells containing the No l
55. Inflated Models When the zero inflated option is used the model is expanded with M latent classes that are assumed to respond with probability one to a certain cate pred gory that is P y mlx z 1 for x K m Such latent classes are 24 sometimes referred to as stayer classes in mover stayer models or brand loyal classes in brand loyalty models 2 11 Restrictions on the Regression Coefficients Various types of restrictions can be imposed on the Class specific regression coefficients attribute and predictor effects can be fixed to zero restricted to be equal across certain or all Classes and constrained to be ordered More over the effects of numeric attributes can be fixed to one These constraints can either be used as a priori restrictions derived from theory or as post hoc restrictions on estimated models Certain restrictions apply to parameters within each Class while others apply across Classes The within Class restrictions are e No Effect the specified effect s are set to zero e Offset the selected effect s are set to one thus serving as an offset The offset effect applies to numeric attributes only Between Class restrictions are e Merge Effects the effects of a selected attribute predictor are equated across 2 or more specified Classes e Class Independent the effects of a selected attribute predictor are equated across all Classes e Order ascending or descending in each Class
56. ML or PM estimates based on all available information The assumption that is made is that the missing data are missing at random MAR or equivalently that the missing data mechanism is ignorable Little and Rubin 1987 Schafer 1997 Skrondal and Rabe Hesketh 2004 Vermunt 1997 In the case of missing data it is important to clarify the interpretation of the chi squared goodness of fit statistics Although parameter estimation with missing data is based on the MAR assumption the chi squared statistics not only test whether the model of interest holds but also the much more restrictive MCAR missing completely at random assumption see Vermunt 1997 Thus caution should be used when interpreting the overall goodness of fit tests in situations involving missing data 3 2 2 Attributes predictors and covariates Missing values on attributes will never lead to exclusion of cases or replica tions from the analysis If the technical option for including missing values 32 on covariates and predictors is off cases with missing covariate values and replications with missing predictor values are excluded from the analysis When this technical option is on such cases and replications are retained by imputing the missing values using the method described below Missing values on numeric predictors and covariates are replaced by the sample mean This is the mean over all cases without a missing value for covariates and the mean of all repl
57. Maca FS Yro 5 Var Zi 2 Asa F 4 D S Aza F g ao eae r 1 d 1 r 1 Also when adopting a nonparametric random effects approach one may in clude covariates in the multilevel LC model that is R Cou Yrs z0 gt Vas er Egin gt r 1 Mala 19 This yields a model for the latent classes in which the intercept and the covariate effects may differ across GClasses In fact we have a kind of LC Regression structure in which the latent classes serve as a nominal dependent variable and the GClasses as latent classes An important extension of the above nonparametric multilevel LC models is the possibility to regress the GClasses on group level covariates This part of the model has the same form as the multinomial logistic regression model for the Classes in a standard LC or FM model 7 2 2 LC discrete choice models for three level data Another application type of the Latent GOLD Choice multilevel option is three level regression modeling Vermunt 2004 A three level LC condi tional logit model would be of the form T 9 pred y att g P y5 2 gt P x y NP oy le Yjie Zjit Z sigo Y x9 1 i l x 1 t 1 Suppose we have a model for first choices with alternative specific constants and P attributes The simplest linear predictor in a model that includes 70 GClasses would then be Rcon att gott BEng Nmlasie0 09 Pem Y gt zp mitp Om 29 gt which is a model in which only the constants are affecte
58. PSU Primary Sampling Unit variable is used for two stage cluster samples It specifies the sampling cluster to which a case belongs PSUs are assumed to be nested within strata When no PSU variable is specified it is assumed that each case forms a separate PSU Sampling Wgt 97 The Sampling Wgt variable contains a sampling weight Rescale default vs No Rescale Upon selecting a variable to be used as a Sampling Wgt the symbol lt R gt appears in the Sampling Wgt box to the right of the variable name to indicate that the weights will be rescaled Rescaling of the original weights are accomplished by multiplying them by a constant such that the sum of the weights equals the sample size gt Right click on the variable name and select No Rescale from the popup menu to maintain the weights without rescaling Upon selection of No Rescale the lt R gt symbol is removed Active default vs Inactive By default the sampling weights are used in the estimation to compute pseudo maximum likelihood estimates as indicated in Section 8 1 of the Technical Guide If the sampling weight were instead specified as a Case Weight in the Variables Tab the resulting parameter estimates would be the same as when the Active option is used here but the standard errors are not correct The inactive option for sampling wgt employs an alternative 2 step estimation algorithm developed by Vermunt and Magidson 2001 gt Right click on the v
59. Sets p Alternatives p Set ID Cancel Estimate Help Figure 6 Attributes Tab The alternative specific constants _Constants_ appear in the Variable List Box along with any other attributes 95 Alternatives Button click this to open an alternatives file see Tutorial 1 for an example The specified Alternative ID variable name is displayed in the Alternative ID box Choice Sets box This box becomes activated after an Alternatives file has been opened The specified Set ID variable name is displayed in the Alternative ID box Attributes box To estimate an effect for an attribute select it from the List box and move it to the Attributes box Right click on any attribute in the Attributes box to set the scale type Nominal or Numeric Note the _Constants_ may not be set to Numeric Scan Upon clicking Scan the Alternatives and Sets file are scanned and values for the following are displayed Total Alternatives Choice Sets Total Choice Sets Alternatives Prior to estimating the model you may also wish to pre specify certain restrictions that you wish to impose on the model parameters Open the Model Tab to apply restrictions You may also alter the technical settings in the Technical Tab Advanced Tab Advanced Module only The Advanced Tab is divided into 3 sections according to the labels Survey Continuous Factors and Multilevel Model 96 Choice Model BrandsAB sav M
60. Settings for a Profile To change the settings for a Profile Plot right click or select Plot Control from the Model Menu within the Contents pane when a Profile Plot is displayed to open the Plot Control dialog box To change the font type size for a plot see Main Menu Options Profile Plot Settings Legend When this option is selected a legend appears at the bottom of the Profile Plot Classes A line will be drawn for each class selected Those classes with a checkmark in the checkbox are included in the plot Variables Select which variables Contants Attributes Covariates to include in the plot Those with a checkmark in the checkbox are included in the plot Categories Select which category of a variable to include in the plot The category currently being plotted is listed in the plot beneath the variable name To change the category that is plotted highlight the variable name in the Variables box the category currently being plotted will appear in the Category box click the drop down list to the right of the Categories box and select the category you wish to have plotted Groups Click Update once you have specified a new number of groups 118 ProbMeans View e The table contains class membership probabilities which are displayed in the Uni Plot and Tri Plot To view the Probability Means table for a selected model click ProbMeans in the Outline pane To view a plot click on the expand icon to the left of
61. a bootstrap estimate of the p value corresponding to the difference in log likelihood value between two nested models such as two models with different numbers of latent classes The 2L L difference statistic is defined as 2 LLy LLh where Ho refers to the more restricted hypothesized model say a K class model and A to the more general model say a model with K 1 classes Replication samples are generated from the probability distribution defined by the ML estimates under Ho The estimated bootstrap p value Pooor is defined as the proportion of bootstrap samples with a larger 2L L difference value than the original sample The bootstrap of the 2L L difference statistic comparing models with different numbers of latent classes was used by McLachlan and Peel 2000 in the context of mixture of normals Vermunt 2001 used bootstrap p values for both the L and the 2LL difference statistic in the context of order restricted latent class models where the L measured the goodness of fit for an ordinal latent class model and the 2LL difference concerned the difference between an order restricted and an unrestricted latent class model The other parameter is Seed which can be used to replicate a bootstrap The seed used by the bootstrap to generate the data sets is reported in the output 40 Two technical details about the implementation of the bootstrap should be mentioned For each bootstrap replication the maximum likelihoo
62. abel right click and select Yes from the pop up menu The No changes to Yes in the selected cells and the indices in the selected rows all change to 1 to indicate that the effects for classes 2 3 etc are all restricted to be equal to the corresponding effects for class 1 Alternatively the restriction of class independence can be imposed as follows 1 select all the cells containing the indices for a chosen row 2 right click to bring up the pop up menu and 3 select Merge Effect To undo these restrictions reselect the cells right click and select Separate Effect For rating models the class independence restriction has the additional option No Simple for the _Contants_ This yields a more parsimonious model for describing the manner in which constants differ across classes Right clicking on one or more cells under the column marked Order brings up a menu that allows the imposition of an order restriction Ascending Descending or None Click the Reset button to reset the restrictions to the original default settings of no restrictions By default no restrictions are imposed i e Separate Effects are estimated for each parameter For further details see section 2 11 in the Technical Guide ClassPred Tab The ClassPred Tab contains various output to file options associated with prediction and classification and also contains a Known Class indicator option which allows more control ove
63. ak identification can be detected from the occurrence of large asymptotic standard errors Local solutions may also result from weak identification Other identification issues are related to the order of the Classes and the uniqueness of parameters for nominal variables For unrestricted Choice models the Classes are reordered according to their sizes the first Class is always the largest Class Parameters y s and s involving nominal vari ables are identified by using either effect or dummy coding which means that parameters sum to zero over the relevant indices or that parameters corresponding to the first or last category are fixed to zero Note that the Parameters output also contains the redundant y and 8 parameters and in the case of effect coding also their standard errors 41 3 9 Selecting and Holding out Choices or Cases The replication and case weights can be used to omit certain choices or cases records with a common case ID from the analysis With a weight equal to zero one can remove a choice case from the analysis and no output is provided for this choice case Alternatively a very small weight 1 0e 100 can be used to exclude choices cases for parameter estimation while retaining the relevant prediction and classification output 3 9 1 Replication and case weights equal to zero Setting case weights equal to zero will eliminate the corresponding cases from the analysis This feature can be used to selec
64. al Guide Coding Nominal Effect default By default the Parameter Output contains effect coding for nominal variables As far as the dependent variable is concerned the coding affects the alternative specific constants and when specifying choice and rank models it also affects the predictor effects In addition the coding affects the effects of nominal attributes and nominal predictors and in the model for the classes it affects the parameters for nominal covariates and for the classes Use this option to change to dummy coding Dummy Last Selection of this option causes dummy coding to be used with the last category serving as the reference category Dummy First Selection of this option causes dummy coding to be used with the first category serving as the reference category Variance Covariance Matrix When the input data file is either an ASCII text file or an SPSS sav file this option outputs the variance covariance matrix of the parameter estimates to an external file Output Filename Upon selection of this option a default filename appears in the box directly below the check box Use the browse button to change the filename and or its save location The format of the output file will be the same as that of the input file ASCII or sav The body of the output file contains the variances and covariances of all model parameters Each row in this output file corresponds to a parameter The first variable column on this file i
65. any com bination of these The purpose of a discrete choice analysis is to predict stated or revealed preferences from characteristics of alternatives choice situations and respon dents The regression model that is used for this purpose is the conditional logit model developed by McFadden 1974 This is an extended multinomial logit model that allows the inclusion of characteristics of the alternatives attributes such as price as explanatory variables Although the conditional logit model was originally developed for analyzing first choices each of the other response formats can also be handled by adapting the basic model to the format concerned For example a ranking task is treated as a sequence of first choices where the alternatives selected previously are eliminated a rating task is modelled by an adjacent category ordinal logit model which is a special type of conditional logit model for ordinal outcome variables Latent GOLD Choice is not only a program for modeling choices or pref erences but also a program for latent class LC analysis A latent class or finite mixture structure is used to capture preference heterogeneity in the population of interested More precisely each latent class corresponds to a population segment that differs with respect to the importance or weight given to the attributes of the alternatives when expressing that segment s preferences Such a discrete characterization of unobserved heterogeneity
66. ariable name and select Inactive from the popup menu to select this option Upon selection of Inactive the lt I gt symbol appears in the Sampling Wgt box to the right of the variable name If the sampling weight variable were instead not used at all in the estimation not specified as either a Case Weight nor a Sampling Wgt the parameter estimates obtained would be the same as when the Inactive option is used here but the sizes of the latent classes would be biased The advantage of this method over the Active option is that the unweighted estimates may be more stable See Section 8 2 of the Technical Guide for further information about these options Population Size The Population Size variable can be used to specify either the size of the population of PSUs in the Stratum concerned or the population fraction The variable is assumed to be a population fraction when it 1s smaller or equal to 1 This option can be use for finite population corrections Continuous Factors This advanced option can be used to include up to 3 continuous latent variables CFactors in a model yielding standard random coefficients discrete choice model and LC discrete choice models with random coefficients CFactors are assumed to affect the alternative specific constants as well as the attribute and predictor effects which implies that the corresponding parameters are random coefficients whose values vary across cases On the Model tab one can
67. ass selected a checkmark in the checkbox a Uni Plot will be displayed By default all classes are selected Axis Flip To flip reverse the axis for a Uni Plot select the corresponding class factor name By default the class probabilities factor mean range is from 0 to 1 increasing Selecting Axis flip for a class factor will reverse the axis to range from 1 to 0 decreasing Variables Select which indicators covariates to include in the Uni Plots Selected variables are indicated by a checkmark in the checkbox By default the Uni Plots contain all the indicators covariates included in the model Set Can be used to indicate which choice sets should appear in the be plot Groups Use the grouping option to reduce the number of categories for a variable click Update once you have specified a new number of groups for further details on the grouping option Tri Plot The probabilities in the body if the ProbMeans output table are plotted to form a Tri Plot To view the Tri Plot click on the expand contract icon to list the ProbMeans plots and highlight Tri Plot Note No Tri Plot is produced for a 1 class model for a 2 class model the Tri Plot reduces to the Uni Plot e By default Vertex A left most base vertex is labeled Class 1 Vertex B right most base vertex Class 2 and the third Vertex the top point of the triangle represents the aggregate of all other classes For a 3 class model by default the third v
68. assification information as well as Covariate Classification can be viewed as Tabular output and or can also be output to an external file Selection of these from the Output Tab produces the Tabular output Selecting Standard Classification and or Covariate Classification from the Output to File section of the ClassPred Tab produces the external files which contain the classification information as new variables appended to a copy of the input file used for estimation See below Set Profile Shows hides Set Profile in Output Window Set ProbMeans Shows hides Set ProbMeans in Output Window The Set Profile and Set ProbMeans output sections contain information on the estimated choice probabilities per choice set For rankings these are based on the first choice replications only For choices and ratings all replications are used For more information see Section 4 5 of the Technical Guide Importance Shows hides Importance in Output Window The Importance output reports the maximum effect for each of the attributes including the constants as well as re scaled maximum effects that add up to one within latent classes For more information see Section 4 3 of the Technical Guide 109 Iteration Detail Shows hides Iteration Detail in Output Window If this output is not selected it still will appear if any problems are encountered during model estimation Standard Errors and Wald Choose one of four options The first three options
69. assumed to be Class independent There is a special variant of the Class independent option called No Sim ple that can be used in conjunction with the constants in a rating model With this option the constants are modeled as Geer Bo y where B is subjected to an effect or dummy coding constraint This specification of Class specific constants is much more parsimonious and is in fact equiv alent to how x y relationships with ordinal y s are modeled in LC Cluster models Rather that estimating K M intercept terms one now estimates only M K 1 coefficients that is one extra coefficient per extra latent class Order constraints are important if one has a priori knowledge about the sign of an effect For example the effect of price on persons preferences is usually assumed to be negative or better non positive for each latent class segment If the price effect is specified to be Descending the result ing parameter estimate s will be constrained to be in agreement with this assumption The No Effect option makes it possible to specify a different regression equation for each latent class More specifically each latent class may have different sets of attributes and predictors affecting the choices Post hoc constraints can be based on the reported z value for each of the coefficients An example of an a priori use of this constraint is the inclusion of a random responder class a
70. at smaller scale factor for the revealed preferences than for the stated preferences Assuming that the scale factor for the stated preferences is 1 0 values between 0 5 and 1 0 could be tried out for the revealed preferences for example a re exp 0 75 m z z P yit miz a Zi 5 Sit M n soem exp 0 75 Melba A limitation of the scale factor implemented in Latent GOLD Choice is that it cannot vary across alternatives However a scale factor is nothing more than a number by which the attributes and predictors are multiplied which is something that users can also do themselves when preparing the data files for the analysis More precisely the numeric attributes of the alternatives may be multiplied by the desired scale factor 2 5 Replication Weight and Constant sum Data The replication weights v modify the probability structure defined in equa tion 3 as follows K P yilzs D gt PO TI Poean 204 c t 1 E 17 The interpretation of a weight is that choice y is made v times One of the applications of the replication weight is in the analysis of constant sum or allocation data Instead of choosing a single alternative out of set the choice task may be to attach a probability to each of the alternatives These probabilities serve as replication weights Note that with such a response format the number of replications corresponding to a choice set will be equal to the number of alternatives A si
71. at the slopes are the same for each pair of adjacent categories In other words the attribute and predictor effects are the same for the choice between ratings 2 and 1 and the choice between ratings 5 and 4 2 4 Replication Scale and Best Worst Choices A component of the LC choice model implemented in Latent GOLD Choice that has not been introduced thus far is the replication specific scale factor Sit The scale factor allows the utilities to be scaled differently for certain replications Specifically the scale factor enters into the conditional logit model in the following manner exp Sit Nmj z zi PY mix 20 zS Su i Sit Ait 3 S exp Sit mita Thus it is seen that while the scale factor is assumed to be constant across alternatives within a replication it can take on different values between repli cations The form of the linear model for mjz z 18 not influenced by the scale factors and remains as described in equation 2 Thus the scale factor al lows for a different scaling of the utilities across replications The default setting for the scale factor is s 1 in which case it cancels from the model for the choice probabilities Two applications of this type of scale factor are of particular importance in LC Choice modeling The first is in the analysis of best worst choices or maximum difference scales Cohen 2003 Similar to a partial ranking task the selection of the best and worst alternatives
72. ates and the latent variable after estimating a model without covariate effects More detail on the latter method is given in the subsection explaining the Profile and ProbMeans output Another approach that can be used to explore the relationship between covariates and the latent variable is through the use of the CHAID option This option may be especially valuable when the goal is to profile the latent classes using many inactive covariates This option requires the SI CHAID 4 0 add on program which assesses the statistical significance between each covariate and the latent variable For further information about the CHAID option see Section 4 9 2 8 Coding of Nominal Variables In the description of the LC choice models of interest we assumed that attributes predictors and covariates were all numeric This limitation is not necessary however as Latent GOLD Choice allows one or more of these explanatory variables to be specified to be nominal For nominal variables Latent GOLD Choice sets up the design vectors using either effect ANOVA type coding or dummy coding with the first or last category as reference category for identification Effect coding means that the parameters will sum to zero over the categories of the nominal variable concerned In dummy coding the parameters corresponding to the reference category are fixed to Zero Suppose we have a nominal attribute with 4 categories in a model for first choices The effect coding co
73. ation may be accounted for by specifying group level latent classes GClasses and or group level CFactors GCFactors In addition when 2 or more GClasses are specified group level covariates GCovariates can be included in the model to describe predict them e Survey options for dealing with complex sampling data Two important survey sampling designs are stratified sampling sampling cases within strata and two stage cluster sampling sampling within primary sampling units PSUs and subsequent sampling of cases within the selected PSUs Moreover sampling weights may exist The Survey option takes the sampling design and the sampling weights into account when computing standard errors and related statistics associated with the parameter estimates and estimates the design The parameter estimates are the same as when using the weight variable as a Case Weight when this method is used An alternative two step approach unweighted proposed in Vermunt and Magidson 2001 is also available for situations where the weights may be somewhat unstable Additional Optional Add ons to Latent GOLD Choice 4 0 The following optional add on programs are also available that link to Latent GOLD Choice 4 0 Latent GOLD 4 0 A license to Latent GOLD 4 0 allows you to use a single fully integrated latent class program that contains the Choice program as one module and includes 3 additional modules to allow estimation of LC Cluster DFactor Discr
74. attern should also have observed values on the same set of replications or With u we mean all the cases with covariate pattern u and with i u all the data patterns with covariate pattern u 44 corresponding to data pattern i 1 Using these definitions of Mi n and N the chi squared statistics are calculated as follows L 2 Nis log 1 E nix X N gt Mi 2 ae ni 2 3 CR 18 m 1 e Ge The number of degrees of freedom is defined by 4 mn Y Tas 1 N nper u 1 t 1 Here T is the total number of replications in covariate pattern u and M denotes the number of alternatives of the tth observed replication cor responding to covariate pattern u The term min indicates that df is based on the sample size N when the number of independent cells in the hypothetical frequency table is larger than the sample size The chi squared values with the corresponding df yield the asymptotic p values which can be used to determine whether the specified model fits the data If the Bootstrap L option is used the program also provides the estimated bootstrap p value corresponding to the L statistic as well as its standard error This option is especially useful with sparse tables in which case the asymptotic p values cannot be trusted Note that sparseness almost always is a problem in LC choice models The best indication of sparseness is
75. been specified as Continuous or Count This output file will be listed as Freqs Residuals in the Outline pane Classification Output optional Standard Classification Produces an output file listing containing posterior membership probabilities and other information used to classify cases into the appropriate latent class This output file will be listed as Standard Classification in the Outline pane Each row in the Standard Classification output corresponds to a distinct observed data pattern in the data file Advanced For each CFactor and GCFactors this file also contains the factor means and for GClasses the classification probabilities and the modal assignment Covariate Classification Classification is usually performed based on all available information for a case Standard Classification However it is also possible to compute the probability of being in a certain latent class or a factor mean given covariate values only In fact these are model probabilities that is P xIz see Section 4 1 4 of the Technical Guide These probabilities are useful for classifying new cases for which information on the dependent variable or indicators is not available Each row in the Covariate Classification output corresponds to a distinct pattern of active covariates that is observed in the data file Note Inactive covariates do not influence the classification probabilities and hence have no affect on this output Standard Cl
76. ber of alternatives per set or in our termi nology that certain alternatives are impossible More precisely M is still the maximum number of alternatives but certain alternatives cannot be se lected in some replications In order to express this we need to generalize our notation slightly Let A denote the set of possible alternatives at replica tion t for case i Thus ifm Aj P yi mix 2a 22 is a function of the unknown regression coefficients and if m An Plya mlx 2a 2 0 An easy way to accomplish this without changing the model structure is by setting Mmjxz 00 for m Ay Since exp oo 0 the choice probability appearing in equation 1 becomes exP Nnja zi P Ya mix gut gre gt it Lit 9 it V m cAn pliz ifm Ay and P ya mix 2 2 0 ifm Ay As can be seen the sum in the denominator is over the possible alternatives only When the dependent variable is specified to be a ranking variable the specification of those alternatives previously selected as impossible alterna tives is handled automatically by the program The user can use a missing value in the sets file to specify alternatives as impossible This makes it pos sible to analyze choice sets with different numbers of alternatives per set as well as combinations of different choice formats In the one file data format choice sets need not have the same numbers of alternatives In this case t
77. bot bo bs ho e a a belittle tbh bale ClassPred Ta dd ellos tele Lolo vate or lor dae da oe Lon ee ld PEChinCAl A NAO Step 2 Specify Output Options Output e Leo Le e e ES ClassPred o lo odas Step 3 Estimate the Model csscssccsssscssssesessssessssesssseescssssessssessecessesecsessssessssessssessessesssessessssersssersees 114 Step 4 Viewing Output ccscsccsccccscccccccccccccccccccccsscsscssssessessessescescsscccsssssseseeass 116 OutputOpions s E aha 116 Parameters OUP novias 117 Importanide citadino asha ais 118 Pr o o 118 Profile a e do 118 Profil Plot das 118 A O OO 119 Uat a 119 Un Pd aos vats bata o o e SE 119 TA ad 119 TAPA Anat Ae ete etsiatsbenlah EE 120 Set Profe and Ser Prob MCAns toi EEEa 120 Main Menu Options AAA L2 BUS ME e O A a Baty A Sat as E NA 121 6 V1 Gd oi b Renn tn E a AA nd A T A AA 121 VIEW OU E a A DA R S 121 Md Me TU a packs at oa a a e o e dd tes ll 122 Wid Ow Me Lo oe ate dodo eo aa 123 Hel p A A E AE 123 Tutorial 1 Using Latent GOLD choice to Estimate Discrete Choice Models ssccsscsssscessseceeee 124 Manual for Latent GOLD Choice 4 0 Structure of this manual This manual consists of 3 Parts Part 1 gives the overall general introduction to the program and new features Part 2 entitled the Technical Guide documents all model options technical features and output sections It consists of 4 sections followed by a list of technical references Sect
78. by adapting the model for first choices to the response format concerned This is done internally automatically by the program It is however also possible to specify ranking or rating models as if they were models for first choices In the ranking case this involves specifying additional choice sets in which earlier selected alternatives are defined as impossible A rating model can be specified as a first choice model by defining the categories of 19 att the dependent variable as alternatives with attribute values 2 att Ym itp Given the fact that each of the response formats can be treated as a first choice it is possible to make any combination of the formats that were discussed Of course setting up the right alternatives and sets files may be quite complicated An issue that should be taken into account when using combinations of response formats is the scaling of the replications For example it might be that the utilities should be scaled in a different manner for ratings than for first choices equal to 2 7 Covariates In addition to the explanatory variables that we called attributes and pre dictors it is also possible to include another type of explanatory variable called covariates in the LC model While attributes and predictors enter in the regression model for the choices covariates are used to predict class membership In the context of LC analysis covariates are sometimes referred to as c
79. c approaches GClasses and GCFactors may affect the intercept and the covariate effects in the model for the Classes affect the alternative specific constants and the attribute and the predictor effects in the model for the dependent variable see Model Tab GClasses may themselves be affected by Group level covariates GCovariates When CGClasses or CGFactors are included in the model they appear on the Model Tab for further model specification as described earlier that is to indicate whether alternative specific constants attribute effects and predictor effects vary randomly across groups To include GClass and or GCFactor effects in the model for the Classes use the Group Specification Box Group specification box 99 The Group specification box at the lower right of the Advanced Tab contains a column for GClasses and additional columns for each GCFactor specified Click in the check boxes to allow estimation of desired parameters When GClasses and or GCFactors are included it is assumed that these affect the intercept in the Model for the Classes This yields the standard multilevel latent choice model in which class sizes are assumed to differ across groups by using a parametric or nonparametric random intercept model for the latent classes GClasses and GCFactors may also be allowed to affect the covariate effects in the Model for the Classes This is accomplished by checking the corresponding terms on the Advanced Tab
80. cal solution Hence if you open an lgf file that was created using an earlier version of Latent GOLD Choice you should make sure to restore the default value of 0 and increase the value for Random Sets to the default value of 10 or some other desired quantity Tolerance Indicates the convergence criterion to be use when running the model of interest with the various start sets The definition of this tolerance is the same as the one that use used for the EM and Newton Raphson Iterations 105 Bayes Constants The Bayes options can be used to eliminate the possibility of obtaining boundary solutions You may enter any non negative real value Separate Bayes constants can be specified for three different situations Latent Variables The default is 1 Increase the value to increase the weight allocated to the Dirichlet prior which is used to prevent the occurrence of boundary zeroes in estimating the latent distribution The number can be interpreted as a total number of added cases that is equally distributed among the classes and the covariate patterns To change this option double click the value to highlight it then type in a new value Categorical Variables The default is 1 Increase the value to increase the weight allocated to the Dirichlet prior which is used in the model for the dependent variable The number can be interpreted as a total number of added cases to the cells in the models for the dependent to prevent the occurrenc
81. centages as opposed to column percentages they become the ProbMeans output which yields an informative display in the dimension of the segments gt Click on ProbMeans to display the corresponding row percentages 144 File Edit View Model Window Help 5 a 4 53 alaj gt e ne CbcRESP sav Class1 Class Class3 1 class L 490 Overall Probability 05033 0 2646 0 2321 2 class L 402 Attributes 3 class L 391 FASHION 4 class L 390 modern Model5 L 391 traditional 0 1011 3 class final L QUALITY Parameters higher Profile standard ProbMeans Set Profile 25 Set ProbMea 50 Standard Cla 75 Covariate Cle 100 Model 125 Covariates Figure 40 Final 3 class Model ProbMeans Output The row percentages highlighted above state that persons choosing Traditional vs Modern style shoes where price and quality are the same are most likely posterior probability 6185 to be in segment 2 These row percentages can be used to position each category in an informative 2 dimensional barycentric coordinate display Choices can also be appended to this plot Click on the expand icon next to ProbMeans Click on Tri Plot to display the Plot Right click in the Contents Pane to display the Plot Control Panel Click on PRICE to remove PRICE from the plot Click on the drop down set list and selec
82. chnical settings Similar to what was dis cussed in the context of CFactors with GCFactors the marginal density P y z described in equation 25 is approximated using Gauss Hermite quadrature With three GCFactors and B quadrature nodes per dimension the approximate density equals B B B ee 2 2 29 73 PlylZ x9 PP Fe PE PE P3 iMa Ply Iz 25 ML PM estimates are found by a combination of the upward downward variant of the EM algorithm developed by Vermunt 2003 2004 and Newton 20In fact the multilevel LC model implemented in Latent GOLD Choice is so general that many possibilities remain unexplored as of this date It is up to Latent GOLD Choice Advanced users to further explore its possibilities 68 Raphson with analytic first order derivatives The only new technical setting in multilevel LC Choice models is the same as in models with CFactors that is the number of quadrature nodes to be used in the numerical integration concerning the GCFactors As explained earlier in the context of models with CFactors the default value is 10 the minimum 2 and the maximum 50 7 2 Application Types 7 2 1 Two level LC Choice model The original multilevel LC model described by Vermunt 2003 and Vermunt and Magidson 2005b was meant as a tool for multiple group LC analysis in situations in which the number of groups is large The basic idea was to formulate a model in which the latent class distribution class sizes
83. cient GRC formulation proposed by Skrondal and Rabe Hesketh 2004 p 101 In fact it is assumed that the unobserved heterogeneity in the regression coefficients can be summarized by at most three underlying CFactors 6 2 2 LC FM regression models with random effects A unique feature of Latent GOLD Choice is that it allows you to combine random effects with latent classes More specifically it is possible to specify LC Choice models in which the intercept and or some of the regression coef ficients vary within latent classes Lenk and DeSarbo 2000 proposed using random effects in FMs of generalized linear models and B ckenholt 2001 proposed using random effects in LC models for ranking data It has been observed that the solution of a LC Choice analysis may be strongly affected by heterogeneity in the constants In choice based conjoint studies for example it is almost always the case that respondents differ with respect to their brand preferences irrespective of the attributes of the 65 offered products A LC Choice model captures this brand heterogeneity phenomenon via Classes with different constants However the analyst often likes to find relatively small number of latent classes that differ in more meaningful ways with respect to attribute effects on the choices By including random alternative specific constants intercepts in the LC Choice model for example P Rcon att att con Nmjz zit Fii T Pim E D
84. class model For repeated measure applications such as this data sparseness exists which implies that the L statistic does not follow a chi square distribution In such cases the reported p value is not correct and the number of degrees of freedom is not informative Hence we will exclude the ES df and associated p value columns from our Model Summary output In addition notice that 2 different R measures are displayed For this model we use the standard R For other applications as in tutorial 3 it will be more appropriate to utilize RYO rather than R to assess the goodness of model prediction The baseline for R 0 is a null model containing no predictors at all not even the variable _Constants_ This latter null model predicts each alternative to be equally likely to be selected while the former null model predicts each alternative to be selected with probability equal to the overall observed marginal distribution To remove the undesired columns from the Model Summary Right click in the Contents Pane to retrieve the Model Summary Display Click in the boxes to the left of the L df and p value to remove the checkmarks from these items Close the Model Summary Display by pressing Escape or clicking the X in the upper right of the Display VV V 135 Figure 27 Model Summary Display xi Y LL re M BICILL J BIC L I AIC LL M AIC M AICS LL P AIC3 L MV Npar l df FT p value IV Class Err The Mo
85. covariates Include all Selection of this option includes all cases and replications in the analysis regardless of the presence of missing values Cases or replications with missing values on the dependent variable are included in the analysis and handled directly in the likelihood function Missing values on Predictors or active covariates are imputed using Latent GOLD s imputation procedure Inclusion in a model of covariates designated as inactive has no effect on which cases are excluded Therefore these missing values options have no effect with respect to the presence or absence of missing values on covariates specified to be inactive 106 Bootstrap Options Bootstrap L Bootstrap 2LL Diff The Technical Tab contains options for specifying the number of Replications and a Seed for both the Bootstrap L and the conditional bootstrap Bootstrap 2LL Diff procedures Either of these bootstrap procedures can be requested from the Model Menu for an estimated model as described in Step 10 Replications The default for the number of replication samples is 500 In most applications this number is large enough The program also reports the Monte Carlo standard error of the p value By increasing this number a more precise estimate of the p value is obtained since the Monte Carlo error is reduced With large models to speed up the estimation you may consider reducing the number of replications Seed Seed can be used to specify the see
86. cription of all the features and menus that are available for the estimation of discrete choice models Section 3 presents a sample tutorial to get you up and running quickly Analyzing data from a simulated choice experiment using the 3 file format that describes the basic usage of the program Other tutorials available as separate files include the following Choice_tutoriall A pdf Tutorial 1A Using the SI CHAID add on to profile latent class segments Note Tutorials 2 7 are currently being updated for version 4 0 Choice_tutorial2 pdf Tutorial 2 Using the Results of the Model to Predict Simulate Future Choices Choice_tutorial3 pdf Tutorial 3 Estimating Brand and Price effects Choice_tutorial4 pdf Tutorial 4 Using the 1 file Format Choice_tutorial5 pdf Tutorial 5 Analyzing Ranking data Choice_tutorial6 pdf Tutorial 6 Using LG Choice to Estimate max diff best worst and Other Partial Ranking Models Choice_tutorial7 pdf Tutorial 7 LC Segmentation with Ratings based Conjoint Data Choice_tutorial7 2 pdf Tutorial 7A LC Segmentation with Ratings based Conjoint Data All of these tutorials are also available online at http www statisticalinnovations com products choice html tutorialslink along with other tutorials that are under development For example Tutorial 8 illustrates how to use the program to estimate allocation constant sum models For more information on allocation see Section 2 5 of the Techn
87. d Long L D 1993 Goodness of fit testing for latent class models Multivariate Behavioral Research 28 375 389 Croon M A 1989 Latent class models for the analysis of rankings G De Soete H Feger and K C Klauer New developments in psychological choice modeling 99 121 Elsevier Science Publishers Dayton C M and Macready G B 1988 Concomitant variable latent class models Journal of the American Statistical Association 83 173 178 Dempster A P Laird N M and Rubin D B 1977 Maximum likelihood estimation from incomplete data via the EM algorithm with discussion Journal of the Royal Statistical Society Ser B 39 1 38 Dias J G 2004 Finite Mixture Models Review Applications and Computer intensive Methods Phd Dissertation Research School Systems Organisa tion and Management SOM Groningen of University The Netherlands Dillon W R and Kumar A 1994 Latent structure and other mixture mod els in marketing An integrative survey and overview R P Bagozzi ed Advanced methods of Marketing Research 352 388 Cambridge Blackwell Publishers Galindo Garre F Vermunt J K and Croon M A 2002 Likelihood ratio tests for order restricted log linear models A comparison of asymptotic and bootstrap methods Metodolog a de las Ciencias del Comportamiento 4 325 337 Galindo Garre F Vermunt J K and W Bergsma 2004 Bayesian posterior estimation of logit parameters with small sa
88. d by the GClasses A more extended model is obtained by assuming that also the attribute effects vary across GClasses that is acon att Coe att g gatt Minlzzi 1 29 m T 3 Pap Zmitp T Bm ao 2 B 9 mitp In practice it seems to be most natural to allow effects of attributes and pre dictors that change values across replications to be Class dependent and ef fects of predictors that change values across cases to depend on the GClasses The most extended specification is obtained if all the effects are assumed to be Class dependent which implies including Classes GClasses 1 19 in teractions Such a model is defined as Rcon att Bema att g gatt Nm 2jit 0 29 ote 2 Pap Zmitp e LM I 3 Dip zI Zmitp It should be noted that in each of the above three models identifying con straints have to be imposed on the parameters involving the GClasses The attribute effects for the GClasses for example are restricted by 4 ae 0 por 0 or Copies 0 for1 lt p lt P and 1 lt lt K In other words the parameters in the model for the dependent variable either sum to zero across GClasses are equal to zero for the first GClass or are equal to zero for the last GClass 7 2 3 Three level random coefficients conditional logit models Combining the GCFactors from the multilevel model with the CFactors op tion makes it possible to specify three level random coefficient conditional logit models These are similar to
89. d esti mates serve as start values Thus no random sets are used for the replica tions To gain efficiency in term of computation time the iterations within a bootstrap replication terminate when the replicated L is smaller 2L L diff value is larger than the original one even if the convergence criterion or the maximum number of iterations is not reached 3 8 Identification Issues Sometimes LC models are not identified that is it may not be possible to obtain unique estimates for some parameters Non identification implies that different parameter estimates yield the same log posterior or log likelihood value When a model is not identified the observed information matrix H is not full rank which is reported by the program Another method to check whether a model is identified is to run the model again with different starting values Certain model parameters are not identified if two sets of starting values yield the same log P or log values with different parameter estimates With respect to possible non identification it should be noted that the use of priors may make models identified that would otherwise not be identified In such situations the prior information is just enough to uniquely determine the parameter values A related problem is weak identification which means that even though the parameters are uniquely determined sometimes the data is not informa tive enough to obtain stable parameter estimates We
90. d introduce some notation The data file contains infor mation on J cases or subjects where a particular case is denoted by i For each case there are T replications and a particular replication is denoted by t Note that in the Latent GOLD Choice interface the T observations belonging to the same case are linked by means of a common Case ID Let y denote the value of the dependent or response variable for case i at replication t which can take on values 1 lt m lt M In other words M is number of alternatives and m a particular alternative Three types of explanatory variables can be used in a LC choice model attributes or characteristics of alternatives zhp predictors or characteristics of repli cations 2 and covariates or characteristics of individuals 2 2 Here the indices p q and r are used to refer to a particular attribute predictor and covariate The total number of attributes predictors and covariates is denoted by P Q and R respectively Below we will sometimes use vector notation y Z and Zz to refer to all responses all explanatory variables and all covariate values of case i and z2 and zi to refer to the attribute and predictor values corresponding to replication t for case i Another vari able that plays an important role in the models discussed below is the latent class variable denoted by x which can take on values 1 lt x lt K In other words the total number of latent classes is deno
91. d that is used to generate the replication data sets the default value 0 means random seed for either the Bootstrap of L or the Conditional Bootstrap Bootstrap 2LL Diff procedures Because of Monte Carlo simulation error the bootstrap procedure yields a slightly different p value each time that it is repeated along with an estimate of the standard error Specifying a particular seed guarantees the same result each time By specifying the seed to be equal to the bootstrap seed reported in the Model Summary Output one can replicate a previous run In most bootstrap applications one will only use the Replications option If the Save Definition option in the File Menu is used to save a lgf definition file for a model resulting from the Bootstrap the Bootstrap Seed is saved For the Conditional Bootstrap only the Bootstrap Seed associated with the source model is utilized Advanced Continuous Factors Number of Nodes If 1 or more group level continuous factors are specified in the Advanced Tab this option determines the number of nodes used to approximate their normal distribution By default 10 nodes are used Decreasing this number minimum is 2 nodes will speed up estimation time but reduce precision of the multivariate normality of the CFactors GCFactors For further details see subsections 6 1 and 7 1 of the Technical Guide Default Options Click Save as Default to save the current technical settings as the new default
92. dard errors can be used as a diagnostic tool to assess the significance of a single parameter estimate As a rule of thumb parameter estimates larger than twice their standard error are significant at the 05 level e To replace the standard errors by the Wald statistic s again from the menus choose View Wald Importance The Importance output reports the maximum effect for each of the attributes including the constants as well as re scaled maximum effects that add up to one within latent classes relative importances The latter are plotted in the importance plot Imp Plot Profile View e To view the profile table for a selected model click Profile in the Outline pane For Attributes and the Contants this output section contains the class specific part worths that are transformed by an inverse logit transformation and thus sum to 1 across attribute levels within classes For covariates these are rescaled ProbMeans probabilities For a detailed interpretation of this output see Section 4 4 of the Technical Guide Profile Plot e To view the corresponding Profile Plot click the icon to expand the Profile output and highlight Prf Plot e Click on any variable symbol in the Profile Plot and the status bar describes it variable name class factor level number value e Click on any class name or symbol in the legend and Latent GOLD highlights all the symbols that refer to that class factor level To Change
93. del Summary in the Contents Pane now looks like this io xi File Edit View Model Window Help Sa 83 aja ple rl Model 1 Class Choice 5 0 0000 0 0454 Model2 2 Class Choice 9 14 0 0303 0 1988 H Model3 3 Class Choice 3643 6719 HEN 23 0 0707 0 2157 Model4 4 Class Choice 3640 7484 PEIEE 32 0 0810 0 2207 x Me Npar Class Err Ria BIC LL Modeli Model2 Model3 Y uN y 5 1474 Figure 28 Updated Model Summary Display Notice that BIC criteria correctly identifies the 3 class model as best lowest BIC Notice also that the R increases from 0454 for the standard aggregate 1 class model to 2157 for the 3 class model This R statistic assesses the percentage variance explained in the dependent variable In this example the nominal dependent variable consists of 4 alternatives in each set alt1 alt2 alt3 and alt4 variance is defined in terms of qualitative variance Magidson 1981 and the baseline or null model is the model that includes the alternative specific constants _Constants_ as the only predictor gt In the Outline Pane click once on Model 3 to select it and click again to enter Edit mode and rename it 3 class for easier identification gt Click once on each of the other models and rename each of them as well As a formal assessment of whether the R obtained from the 4 class model provides a signif
94. dex of cases replication index of replications for case i response of case 7 at replication t category of the response variable score assigned to category m of a rating variable nominal latent variable a particular latent class of latent classes covariate index of covariates attribute index of attributes predictor index of predictors covariate attribute predictor linear predictor parameter in model for yit parameter in model for x known class indicator case weight replications weight covariate pattern index of covariate patterns unique data pattern index of unique data patterns total sample size after weighting 87 11 2 Advanced Models d D F Ad j J Ij Yjit Yj CFactor index of CFactors scores of case 1 CFactor d an effect of CFactor d group index of groups of cases in group j response of case 2 of group j at replication t vector of responses of group j group level quantity group level nominal latent variable a particular group GClass group level covariate score of group j of group level continuous factor GCFactor d group level parameters stratum of strata PSU of PSU s in stratum o sampling weight of cases in PSU c of stratum o total of PSUs in population in stratum o 88 Part 3 Using Latent GOLD Choice 1 0 Overview This part of this manual describes and illustrates the use of the program Section 2 contains a general des
95. e Pip Zmitp Azm Fii p 1 it is much more likely that one will succeed in finding such meaningful Classes segments The random intercept term which may have a different effect in each latent class will filter out most of the artificial variation in the constants 7 Multilevel LC Choice Model 7 1 Model Components and Estimation Issues To explain the multilevel LC model implemented in Latent GOLD Choice we need to introduce and some new terminology Higher level observations will be referred to as groups and lower level observations as cases The records of cases belonging to the same group are connected by the Group ID variable It should be noted that higher level observations can also be individuals for example in longitudinal applications Cases would then be the multiple time points within individuals and replications the multiple choices of an individual at the various time points The index j is used to refer to a particular group and J to denote the number of cases in group j With yj we denote the response at replication t of case 1 belonging to group j with yj the full vector of responses of case 7 in group j and with y the responses of all cases in group j Rather than expanding the notation with new symbols group level quantities will be referred to using a superscript g Group level classes GClasses group level continuous factors GCFactors and group level covariates GCovariates are denoted by x9
96. e is that now the predictions and computations are based on the model probabilities P x z instead of the posterior probabilities P a z yi Whereas the total error can still be de noted as Error x the model prediction error in equation 15 should now be denoted as Error z z instead of Error 2 z y 4 1 5 Prediction statistics Prediction statistics indicate how well the observed choices rankings or ratings are predicted by the specified model For rankings the prediction 5There may be a very small difference which is caused by the Bayes constant for the latent classes 49 statistics are based on first choices only For choice and rating variables all replications are used for obtaining the prediction measures The predicted values used in the computation of the prediction statistics are based on the estimated individual specific response probabilities which are denoted by Prat For ratings we also make use of the estimated expected values Yi Ns Ye Pralits where y is the score for response category m As is shown in detail below Par is computed by weighting Class specific estimates by the posterior membership probabilities Plelz yi This means that our procedure can be called posterior mean mode or expected modal a posteriori prediction The individual specific response probabilities Paji can be obtained as follows J K Pnu X Pole y Pu mje zg zu 17 a 1 As can be seen these are weighted averages
97. e of boundary zeroes These pseudo cases are divided equally across classes and attribute predictor covariate patterns and in accordance with the observed marginal distribution across categories of the dependent variable To change this option double click the value to highlight it then type in a new value Missing Values The Missing Values option allows for the inclusion of cases containing missing values on covariates and sets containing missing values on predictors as well as sets containing missing values on the dependent variable Including cases with missing values on covariates and sets with missing predictors causes the mean to be inputed for the scale type numeric and the effect of the missing value category to be equated to zero for the scale type nominal Missing values on the dependent variable are handled directly in the likelihood function In fact choice sets with a missing value on the dependent variable do not contribute to the log likelihood function Note that missing values on attributes will never cause deletion of choice sets from the analysis Exclude cases Selection of this option excludes all replications having missing values on the dependent variable or any of the predictors and all cases having missing values on any of the active covariates Include indicators dependent Selection of this option excludes replications having missing values on any of the predictors and cases having missing values on any of the active
98. e specified as missing for the NONE alternative In the case of a non numeric string character attribute the character s used to denote a missing value in the example below needs to be formally defined as a missing value in an SPSS sav file For ASCII files the character sequence open quote followed by a closed quote is recognized as missing by Latent GOLD Choice Attributes for which main effects are estimated are specified in the Attributes tab by selecting them from the variable list moving them to the Attributes box and setting the desired scale types see Tutorial 1 Various interaction effects can also be included in the model by defining appropriate interaction variables and selecting them in addition to the included main effects Tutorial 2 shows how to include BRAND by PRICE interactions in the model by constructing the interaction variables PRICE A PRICE B attributes say SPSS Data Editor iof x File Edit View Data Transform Analyze Graphs Utilities Amos4pps Window Help Maid O low medium high low medium high a Data View Variable View la SPSS Processor is ready Figure 2 Example of an SPSS formatted alternatives file defining 7 alternatives Sets file Each record in this file defines a choice set by listing the corresponding identification labels for each choice alternative included in the set format setid 1 alternative
99. e utility of alternative m at replication t given that case belongs to latent class x The linear model for Nm ax zit is __ acon a galt pre pre Nm x Zit 4 m T yA Zitmp Bima itq 2 As can be seen the only ces with the heal model is that the logit regression coefficients are allowed to be Class specific In the LC choice model the probability density associated with the re sponses of case 7 has the form P yi zi 5 P x Plaja 29 227 3 Here P x is the unconditional probability of belonging to Class x or equiv alently the size of latent class x Below it will be shown that this probability can be allowed to depend an individual s covariate values z in which case P x is replaced by P 2 z As can be seen from the probability structure described in equation 3 the T repeated choices of case 7 are assumed to be independent of each other given class membership This is equivalent to the assumption of local independence that is common in latent variable models including in the traditional latent class model Bartholomew and Knott 1999 Goodman 1974a 1974b Magidson and Vermunt 2004 Also in random coefficients models it is common to assume responses to be independent conditional on the value of the random coefficients 2 2 Rankings and Other Situations with Impossible Alternatives In addition to models for repeated first choices it is possible to specify models for rankings One differ
100. ed 5 sw log 5 P zx Z Vadjusted P yilzx Zi Out i l t 76 where Vadjustea are the unknown parameters to be estimated The rationale of this procedure is that an unweighted analysis may yield more stable more efficient estimates for the parameters defining the latent classes but yields biased class sizes and covariate effects The latter are corrected in the second step of the procedure 9 Latent Gold Choice s Advanced Output This section describes the changes and additional items in the Latent GOLD Choice output sections when the Advanced options are used 9 1 Model Summary For multilevel models the first part of the Model Summary output reports the number of groups J in addition to the number cases and replications When the Survey option is used the program reports the generalized design effect def f which is an overall measure indicating how many times larger the design corrected variances are compared to the asymptotic variances For multilevel models Chi squared Statistics are not reported and the bootstrap L and 2LL difference options are not available When the Sur vey option is used the bootstrap based L and LL difference tests are cor rected for the complex sampling design by multiplying the bootstrap replica tions L and 2LL difference values by the generalized design effect def f Note that the bootstrap replication samples themselves are obtained by sim ple random sampling In multilevel
101. ed DFactor Specifies a DFactor model to be estimated Regression Specifies a Regression model to be estimated Choice Specifies a Choice model to be estimated If model names appear in the Outline pane for one or more models that have already been estimated If you click on a model name associated with a previously estimated model the Model Menu contains a checkmark next to the type of model estimated If you click on the name for a new model one that has not yet been estimated there will be a checkmark next to the last type of model estimated Model options appearing in the second section of the Model Menu are Estimate Estimate the model Select this option once your new model is fully specified Estimate All This option may be used when the Outline Pane contains names for 1 or more models associated with a data file that have not yet been estimated Upon selecting the data file name the Estimate All menu entry becomes active in the Model Menu Selection of this option causes all of the associated models that have not yet been estimated to be estimated sequentially beginning with the first such model Note Multiple model names associated with models that have not yet been estimated can occur for a data file only if a previously saved definition lgf file containing the setup for 2 or more models is opened For details on saving lgf files see SAVE DEFINITION in the FILE menu options Bootstrap L and Bootstrap 2LL Diff The Boot
102. els e First choice models An extended multinomial logit model MNL is used to estimate the probability of making a specific choice among a set of alternatives as a function of choice attributes and individual characteristics predictors e Ranking models The sequential logit model is used for situations where a 1 and 2 choice 1 and last choice best worst other partial rankings or choices from a complete ranking of all alternatives are obtained e Conjoint rating models An ordinal logit model is used for situations where ratings of various alternatives which may be viewed as a special kind of choice are obtained For each of these situations response data are obtained for one or more replications known as choice sets Latent class LC choice models account for heterogeneity in the data by allowing for the fact that different population segments latent classes express different preferences in making their choices For any application separate models may be estimated that specify different numbers of classes Various model fit statistics and other output are provided to compare these models to assist in determining the actual number of classes Covariates may also be included in the model for improved description prediction of the segments Advanced program features include the use of various weights case weights replication weights scale weights and options to restrict the part worth utility estimates in various wa
103. ence between first choice and ranking data is that in the former there is a one to one correspondence between replications and choice sets while this is no longer the case with ranking data In a ranking task the number of replications generated by a choice sets equals the number of choices that is provided A full ranking of a choice set consisting of five alternatives yields four replications that is the first second third and fourth choice Thus a set consisting of M alternatives generates M 1 13 replications This is also the manner in which the information appears in the response data file With partial rankings such as first and second choice the number of replications per set will be smaller The LC model for ranking data implemented in Latent GOLD Choice treats the ranking task as sequential choice process B ckenholt 2002 Croon 1989 Kamakura et al 1994 More precisely each subsequent choice is treated as if it were a first choice out of a set from which alternatives that were already selected are eliminated For example if a person s first choice out of a set of 5 alternatives is alternative 2 the second choice is equivalent to a first choice from the 4 remaining alternatives 1 3 4 and 5 Say that the second choice is 4 The third choice will then be equivalent to a first choice from alternatives 1 3 and 5 The only adaptation that is needed for rankings is that it should be possible to have a different num
104. equencies are not reported in multilevel LC models 9 6 Classification The Standard Classification output provides information on the CFactor and GCFactor posterior means E Fa lZ yi and E Fglz y5 the GClass poste rior probabilities P x9 z y and the modal GClass for each data pattern The posterior means are obtained using Gauss Hermite quadrature for ex ample ES Fai P yilZi Fui d Fai SE P yilZi Fai d Fai Dba Fo PlyilZi Foz Pos a Veale Pe E Fuilzi yi In multilevel models with covariates the Covariate Classification out put section reports the GClass membership probabilities given group level covariates P x9 z7 9 7 Output to file Options The Standard Classification option can be used to write the CFactors and GCFactors posterior means the GClasses posterior probabilities and the modal GClass to an output file In models with GClasses Covariate Clas sification saves the classification of groups into GClasses based on group covariates to the output file The Individual Coefficients corresponding to CFactor effects are com puted in a special way K Aigd P x z yi E Fuilzi yi edil xa 1 79 where Falzi Yi x is the posterior mean of Fy given that 1 belongs to latent class x The Nai can be used together with the La to obtain HB like predicted values for case 7 The posterior standard deviation of Niga equals A A 42 OSa SE Pl x z yi E Faizi Yi x Azqd i
105. ertex will 119 represent Class 3 and is labeled Class 3 For a 4 or more class model the third vertex is labeled Others For a 2 class model the class 3 membership probability is 0 and the Tri Plot reduces to the Uni Plot e The triangle symbol marks the overall probabilities for the 3 classes associated with the vertices It represents the centroid of the triangle e Click on any variable symbol in the Tri Plot and 1 the status bar will contain a description of the point variable name and category class probabilities 2 the category label will appear next to that point on the plot and 3 lines emanate from that point to each side of the triangle intersecting the side at the corresponding class probabilities value e Click on any variable symbol or name in the legend and all the symbols for that variable will be highlighted and their category labels listed in the Tri Plot To Change Settings for a Tri Plot To change the settings for a Tri Plot right click or select Plot Control from the Model Menu within the Contents pane when a Tri Plot is displayed to open the Plot Control dialog box To change the font for a plot see Main Menu Options This is a graphical presentation of the data presented in the ProbMeans view Tri Plot Settings Legend When this option is selected a Legend appears to the right of the Tri Plot Point Labels When this option is selected category labels for each variable are listed on t
106. estimation method is no longer ML but PM Posterior Mode Denoting the assumed priors for Y by p 9 and the posterior by P PM estimation involves finding the estimates for Y that maximize the log 5In order to simplify the discussion in this section we discuss only on the situation without known class indicators SIn Latent GOLD Choice Advanced there is a more elegant option for dealing with sampling weights as well as with other complex survey sampling features 31 posterior function logP log log p 9 I gt w log P y z 0 log p 0 i 1 or in other words finding the point where 2 0 Algorithms that are used to solve this problem EM and Newton Raphson are described below The user defined parameters in the priors p 9 can be chosen in such a way that log p 9 0 which makes PM estimation turn into ML estimation PM estimation can also be seen as a form of penalized ML estimation in which p 9 serves as a function penalizing solutions that are too near to the boundary of the parameter space and therefore smoothing the estimates away from the boundary 3 2 Missing Data 3 2 1 Dependent variable If the value of the dependent variable is missing for one or more of the replica tions of case 1 the replications concerned are omitted from the analysis The remaining replications will however be used in the analysis Thus instead of using list wise deletion of cases Latent GOLD Choice provides
107. ete Factors and LC Regression models Numerous tutorials and articles illustrate the use of these 3 kinds of models at http www statisticalinnovations com products latentgold_v4 html In addition the complete LG 4 0 User s Guide and a separate Technical Guide may also be downloaded SI CHAID 4 0 With this option a CHAID CHi squared Automatic Interaction Detector analysis may be performed following the estimation of any LC Choice model to profile the resulting LC segments based on demographics and or other exogenous variables Covariates By selecting CHAID as one of the output options a CHAID input file is constructed upon completion of the model estimation which can then be used as input to S CHAID 4 0 This option provides an alternative treatment to the use of active and or inactive covariates in Latent GOLD Choice 4 0 In addition to standard Latent GOLD output to examine the relationship between the covariates and classes DFactors SI CHAID provides a tree structured profile of selected classes DFactors based on the selected Covariates In addition chi square measures of statistical significance are provided for all covariates Latent GOLD Choice does not provide such for inactive covariates Whenever covariates are available to describe latent classes obtained from Latent GOLD Choice 4 0 SI CHAID 4 0 can be an especially valuable add on tool under any of the following conditions e when many covariates are ava
108. gal gt HB like individual coefficients for a full intercept or predictor term may also be obtained by summing the various individual coefficient components for that term For example for a random intercept model such as given in equation 23 the HB like individual coefficient for a full alternative specific constant is computed by summing Geon and Neen In multilevel models the Cook s D yalio is computed per group rather than per case Thus rather than for detecting influential cases it can be used for detecting influential groups 80 10 Bibliography Agresti A 2002 Categorical data analysis Second Edition New York Wiley Aitkin 1999 A general maximum likelihood analysis of variance components in generalized linear models Biometrics 55 218 234 Andrews R L Ainslie A and Currim I S 2002 An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity Journal of Marketing Research 39 479 487 Andrews R L and Currim I S 2003 A Comparison of Segment Retention Criteria for Finite Mixture Logit Models Journal of Marketing Research 40 235 243 Banfield J D and Raftery A E 1993 Model based Gaussian and non Gaussian clustering Biometrics 49 803 821 Bartholomew D J and Knott M 1999 Latent variable models and factor analysis London Arnold Bechger T M Maris G Verstralen H H F M and Verhelst N D 2005 The
109. gt ma E Figure 31 4 classBoot Model Summary Since the estimated p value is larger than 05 p 178 with an estimated standard error of 17 reported in the above Figure the improvement by going to a 4 class is not statistically significant The estimated p value that you obtain should be close to this estimate Examining the Model Output files gt Click on the expand icon next to the 1 class model and several output files appear gt Click on Parameters to view the part worth utility estimates in the Contents Pane LatentGOLD iof xi File Edit View Model Window Help Sal e sia gt x Model to predict choices Class1 Wald p value FASHION 1 4654 472 1110 116 104 QUALITY E 1 0423 261 2041 9 4e 59 Prof PRICE E Prot 0 1621 4 5715 0 21 Set PRICESQ Set odel5 Y b NONE 4 Figure 32 Parameters Output for 1 class Model 138 Notice that the effect of PRICESQ is highly significant p 4 7 x 107 in this model Since the effect of PRICESQ is 0 in the true model this example shows that the effects and predictions obtained from the aggregate model can not be trusted When choices are based on different utilities in different population segments estimates obtained under the aggregate model will typically be biased gt Click on 1 class and scrol
110. he program treats unused alternatives as impossible alternatives 14 2 3 Ratings A third type of dependent variable that can be dealt with are preferences in the form of ratings Contrary to a first choice or ranking task a rating task concerns the evaluation of a single alternative instead of the comparison of a set of alternatives Attributes will therefore have the same value across the categories of the response variable Thus for rating data it is no longer necessary to make a distinction between attributes and predictors Another important difference with first choices and rankings is that rat ings outcome variables should be treated as ordinal instead of nominal For this reason we use an adjacent category ordinal logit model as the regression model for ratings Agresti 2002 Goodman 1979 Magidson 1996 This is a restricted multinomial conditional logit model in which the category scores for the dependent variable play an important role Let y be the score for category m In most cases this will be equally spaced scores with mutual distances of one e g 1 2 3 M or 0 1 2 M 1 but it is also pos sible to use scores that are not equally spaced or non integers Note that M is no longer the number of alternatives in a set but the number of categories of the response variable Using the same notation as above the adjacent category ordinal logit model can be formulated as follows P Q
111. he Tri Plot next to the variable symbol Vertexes Latent GOLD Choice allows you to select the base vertices in the Tri Plot The top vertex corresponds to the aggregate of the remaining classs e Avertex The class currently used as the A vertex is listed in the drop down box To select a different class click on the down arrow to the right of the vertex box A drop list containing all class will appear Select the class to use as the A vertex e Bvertex The class currently used as the B vertex is listed in the drop down box To select a different class click on the down arrow to the right of the vertex box A drop list containing all classs will appear Select the class to use as the B vertex Variables Select which variables to include in the Tri Plot Those with a checkmark in the checkbox are included in the plot By default the Tri Plot contains all the indicators covariates that were input as part of the model Groups Click Update once you have specified a new number of groups Set Profile and Set ProbMeans The Set Profile and Set ProbMeans output sections contain information on the estimated choice probabilities per choice set For rankings these are based on the first choice replications only With choices and ratings all replications are used Set Profile also contains information on the observed choice probabilities as well as residuals per alternative and per set that compare observed with overall estimated choice probabili
112. he covariates are item dummies The constraint that P y lx 1 1 00 can be imposed by using the offset option and P y m x 2 is left unrestricted 7 2 6 Non multilevel models The final use of the multilevel option we describe here does not yield a mul tilevel model but is a trick for estimating models that cannot be estimated any other way The trick consists of using a Group ID variable that is iden tical to the Case ID or equivalently to have groups that consist of no more than one case each GCFactors can then be used as CFactors This makes it possible to define models in which CFactors affect the latent classes Another possibility is to use the GClasses as an additional case level nominal latent variable yielding a model in which one nominal latent variable may affect another nominal latent variable 8 Complex Survey Sampling The Survey option makes it possible to obtain consistent parameter estimates and correct standard errors with complex sampling designs This option can be used in combination with any model that can be estimated with Latent GOLD Choice Parameter estimation is based on the so called pseudo ML estimator that uses the sampling weights as if they were case weights Correct statistical tests with stratified and clustered samples as well as with sampling weights and samples from finite populations are obtained using the linearization variance estimator Latent GOLD Choice also implement an alternative
113. he model parameters will be estimated The status bar displays messages regarding the status of the estimation Upon beginning a model estimation the stop button on the toolbar becomes red this may take several seconds or longer which indicates that it is now active You can stop a model estimation once it has begun to accomplish either 1 canceling the estimation or 2 pausing the estimation to view preliminary results 114 and or make changes to the requested output options or change the iteration or convergence limits prior to resuming estimation Once the stop button becomes active gt To stop the estimation procedure select Stop from the Model Menu or gt click on the button in the toolbar and a popup menu appears Model Estimation x Abandon C Pause C Continue Abandon Cancel Estimation of a Model gt Select Abandon to cancel the model estimation This option returns the program to its state prior to beginning the Estimation Output is produced only for models that completed the estimation process without being terminated Pause Model Estimation Model Estimation x C Continue The Pause option allows you to pause a model after the model estimation process has begun but prior to the estimation being completed to review preliminary Model Summary Output as well as any of the following Model Output Sections that were requested in the Output Tab Parameters Profile ProbMeans Bivariate Res
114. hich implies that it is equated to 0 It also possible to work with user specified coding schemes An example is category 1 0 0 0 category 2 1 0 0 category 3 1 1 0 category 4 1 1 1 22 which yields parameters that can be interpreted as differences between ad jacent categories More precisely 8 is the difference between categories 2 and 1 8 between categories 3 and 2 and 52 between categories 4 and 3 ds explained in the previous sections the effect and dummy coding con straints are not only imposed on the attribute effects but also on the con stants and the predictor effects in the regression model for first choices and rankings on the constants in the regression model for ratings and on the intercepts and covariate effects in the regression model for the latent classes 2 9 Known Class Indicator Sometimes one has a priori information for instance from an external source on the class membership of some individuals For example in a four class situation one may know that case 5 belongs to latent class 2 and case 11 to latent class 3 Similarly one may have a priori information on which class cases do not belong to For example again in a four class situation one may know that case 19 does not belong to latent class 2 and that case 41 does not belong to latent classes 3 or 4 In Latent GOLD there is an option called Known Class for indicating to which latent classes cases do not belong to Let
115. his value To change this option double click the value to highlight it then type in a new value You may enter any non negative integer Newton Raphson 104 Maximum number of NR iterations The default is 50 If the model does not converge after 50 iterations this value should be increased To change this option double click the value to highlight it then type in a new value You may enter any non negative integer A value of 0 is entered to direct Latent GOLD to use only EM which may produce faster convergence in models with many Start Values To reduce the likelihood of obtaining a local solution the following options can be used to either increasing the number of start sets the number of iterations per set or both Random Sets The default is 10 for the number of random sets of starting values to be used to start the iterative estimation algorithm Decreasing the number of sets of random starting values for the model parameters reduces the likelihood of converging to a local rather than global solution To change this option double click the value to highlight it then type in a new value You may enter any non negative integer Using either the value 0 or 1 results in the use of a single set of starting values Iterations This option allows specification of the number of iterations to be performed per set of start values Latent GOLD Choice first performs this number of iterations within each set and subsequently twice th
116. how well the observed y and z values predict the latent class or in other words how well the latent classes are separated Classification is based on the latent classification or posterior class membership probabilities For response pattern 7 these are calculated as follows Paja P yix zg 200 P y 2 These quantities are used to compute the estimated proportion of classi fications errors as well as three R type measures for nominal variables the proportional reduction of classification errors R2 errors amp measure based on entropy labelled Ri entropy and a measure based on qualitative variance labelled R variance The latter is similar to the Goodman and Kruskal tau b association coefficient for nominal dependent variables Magidson 1981 The proportion of classification errors is defined as P z z yi 14 NE w 1 max Plelz yi T l4New results by Andrews and Currim 2003 and Dias 2004 suggest that AIC3 is a better criterion than BIC and AIC in determining the number of latent classes in choice models E 47 Each of the three R type measures is based on the same type of reduction of error structure namely R Error x Error x z y Error x 1a where Error x is the total error when predicting x without using information on z and y and Error x z y is the prediction error if we use all observed information from the cases Error x z y is defined as the weigh
117. iances of the random intercept terms and for the random slope of att 2 _ con y2 2 att 2 mier equal Oyeon Azm and Tyan Af respectively 18Random effects models are also referred to as multilevel hierarchical mixed effects mixed and random coefficients models 19Note that in Choice the intercept terms are referred to as constants 64 The same model but now with correlated random effects can be defined as follows con att at con att att Nmj zi Fi bm 3 6 Zitmp T FAm F Att a Paz i ll A12 Paz 5 tdo As can be seen here F does not only affect the constants but also the effect of z The variance covariance matrix of the random effects Xy can be obtained by Ng AA where A is a matrix collecting the A parameters More specifically in our example con nene eu AOS Tygon yeon AMT Ami ANA Tyeon wort Amt Not Whereas the random effects models presented thus far contained as many CFactors as random terms this is not necessary in general In fact with three CFactors the Latent GOLD Choice maximum one can define models with any number of random effects This is accomplished with the following factor analytic specification Nmjzi Fi bm g Zien Y Nang Fai y y ee Hae ae 24 d 1 p 1 where again Ny AA This factor analytic specification in which each CFactor may be associated with multiple random effects is equivalent to the generalized random coeffi
118. ical Guide 2 0 Introduction Using Latent GOLD Choice is easy Typically using Latent GOLD Choice to estimate a new model involves the following steps Model Setup Specifying Output Options Model Estimation Viewing Output Step 1 Model Setup A typical session with Latent GOLD Choice analysis begins by opening your data file s and selecting your model settings Your data file s may be SPSS sav files ASCII text txt dat files a Sawtooth Software cho file or if you have the optional DBMS Copy interface SAS Excel and several other file formats While you may prefer to use the single data file format Latent GOLD Choice also allows a more flexible 3 file format When using the 3 file format you begin by opening the response file that contain the choices to each of the choice sets 89 Alternatively you may begin by retrieving previously used setups Opening a previously saved lgf file will retrieve the setups for one or more models that have previously been saved Examples of the use of the FILE OPEN to open a response file and to open an lgf file are given in tutorial 1 Upon opening any file your initial screen consists of two parts the Contents Pane and the Outline Pane Figure 1 Outline Pane and Contents Pane LatentGOLD iol x File Edit View Model Window Help Sal e SQ brandsAB saw Modell Model2 Model3 Outline Contents Pane Pane S Table of ABsets sav completed after
119. icant increase over the 3 class model we can estimate a p value for this improvement using the Conditional Bootstrap By default the Bootstrap procedure utilizes 500 samples and thus may take a few minutes to complete on your computer Feel free to skip this or if it is taking too long you may Cancel the estimation procedure by clicking the Red Stop button to Pause the model and selecting Abandon gt Select the model 4 class in the Outline Pane gt Select Bootstrap 2LL Diff from the Model menu 136 Figure 29 Selecting Bootstrap 2LL Diff ioi xl File Edit View Model Window Help Cus Saj cane E CbcRESP sav 1 ree O ee eee class L y Choice ae ae 2d C 590 ent a0 5 3 dass L Estimate 426 0 197 0 32 0 4dass L Estimate l 57 0 766 0 12609990 Model5 Bootstrap L E 185 0 316 01270 0 843 0 Resume A list of models appears that contain eligible nested base models Eligible Models x 1 class 1 Class Choice 2 class 2 Class Choice 3 class 3 Class Choice Figure 30 List of Eligible Models gt Select 3 class from this list gt Select OK 137 When the procedure has completed 2 additional model names appear in the Outline Pane 3 classBoot and 4classBoot lol x File Edit View Model Window Help Sal eje aja gt lel e 4 classBoot L 3896 1556 a Bootstrap a Parameters p value se sal Importance 0 1780 0 0171 ret
120. ications without a missing value for pre dictors Missing values on numeric attributes are not imputed with a mean but with a 0 which implies that a missing value in the alternatives file is in fact equivalent to using a 0 Missing values on nominal attributes predictors and covariates is dealt with via the design matrix In fact the effect is equated to zero for the missing value category Recall the effect and dummy coding schemes illus trated in subsection 2 8 for the case of a nominal attribute with 4 categories Suppose there is also a missing category In the case of effects coding the design matrix that is set up for the 3 non redundant terms is then category 1 1 0 0 category 2 0 1 0 category 3 0 0 dl category 4 1 1 1 missing 0 0 0 As can be seen the entries corresponding to the missing category are all equal to 0 which amounts to setting its coefficient equal to zero Since in effect coding the unweighted mean of the coefficients equals zero equating the effect of the missing value category to zero implies that it is equated to the unweighted average of the effects of the other four categories This impu tation method for nominal variables is therefore similar to mean imputation with numeric variables In the case of dummy coding with the first category as the reference category the design matrix that is set up for the 3 non redundant terms is category 1 0 0 0 category 2 1 0 0 category 3 0 1 0 category 4 0 0
121. ick and select Numeric VV Double click on FASHION to retrieve the score box Double click on the second category Replace 2 with 0 in the Replace box Click Replace Click OK VVVVV Figure 24 FASHION Score window FASHION User Cat Label Score Count OK 1 modern Cancel Uniform Fixed di ser m Groups Group i 133 Repeat he process for QUALITY Double click on QUALITY to retrieve the score box Double click on the second category Replace 2 with 0 in the Replace box Click Replace Click OK VVVVV Alternatively dummy coding could be accomplished by maintaining the Nominal scale types for these 2 attributes and selecting Dummy Last in the Coding Nominal section of the Output tab Both approaches change to dummy coding the only difference being in the display of the effect estimates in the Parameters Output Only a single effect estimate is shown for each attribute using the former method while the latter associates this effect estimate with the 1 category and displays a 0 effect for the 2 category Estimating Models Now that we have specified the models we are ready to estimate these models gt Click Estimate located at the bottom right of the analysis dialog box The setup window now closes and estimation begins When Latent GOLD Choice completes the estimation the model L which assesses how well the model fits the data and a list of various
122. iduals Iteration Detail 115 The Pause option is not available during estimation involving a bootstrap procedure If you are estimating a range of models the pause option will pause the estimation process for the model currently being estimated and will cancel any further models Depending upon how large the model is that is being estimated it may take anywhere from one second to several minutes or longer to generate the preliminary output listings The Pause option does not cause a preliminary version of output to a data file to be created even if such was requested using the ClassPred Tab If such has been requested it will be produced only if the estimation is resumed and allowed to complete See Resuming Estimation of a Paused Model below Prior to resuming model estimation you may modify the Output options that were requested in the Output Tab or change the Iteration or Convergence Limits in the Technical Tab After a model is paused the model name associated with that model appears in the Outline Pane with the characters Paused appended to it To change the Output options or the Iteration or Convergence limits double click the model name to open the analysis dialog box and make the desired changes in the Output and or Technical Tab Note that the label Resume replaces Estimate on the Estimate button Continue Model Estimation If Stop was selected in error click Continue to continue estimating the model Resu
123. ification Click on Predicted Values VV V WV To append individual HB like coefficients to the response file gt Click on Individual Coefficients gt Replace the default output filename if desired Choice Model CbhcRESP_sav 3 class final C Programs LatentGOLD4 0 datal sav M C Programs LatentGOLD4 0 ChdModell chd cts comcel Esmas Heir Figure 42 ClassPred Tab Note that we also requested a CHAID input file to be created We will illustrate this option later in tutorial 1A gt Click Estimate To open the standard classification output gt Click Standard Classification in the Outline Pane 147 10x File Edit View Model Window Help Sul als sa IJ ObsFreq Modal Class1 Class2 Important Profile ProbMear Set Profile Set Prob Standard Covariate Y MN y Figure 43 Standard Classification Output This output shows that case 1 is very likely posterior probability 9861 a member of segment 3 The covariate classification information is useful for classifying new cases which did not participate in the conjoint experiment gt Click Covariate Classification This table shows that based solely on this person s demographics he would still be classified into segment 3 although with somewhat less certainty probability 8348 We will explore the relationship between the covariates and the classes more extensively in Tutorial 1A usi
124. ight gt Scan Reset Case Weight gt Close Cancel Estimate Help Figure 3 Model Analysis Dialog Box The Model Analysis Dialog Box contains several tabs that are visible on top The tabs are Variables Tab Attributes Tab Advanced Tab Model Tab ClassPred Tab 92 Output Tab Technical Tab By default the Model Analysis Dialog Box opens to the Variables Tab Variables Tab File Format Radial Button 3 File vs 1 File By default a 3 file format is assumed To change the setting click the circle to the left of the appropriate file format If the 1 file format is selected the attributes are assumed to be included on the file The Attributes Tab list all variables on the file and the Alternatives button is deactivated For choice and ranking models the 1 file format contains 1 record per alternative whereas the 3 file format contains 1 record per set For rating models there is one record per set in both file format All variables that may be included in the analysis are listed in the Variables box Variables may be designated as one of these types Dependent Variable Case ID Choice Set Predictors Covariates Replication Scale Replication Weight Case Weight Variable A dependent variable must be specified in order to begin an analysis To select a variable highlight the variable name then click on the appropriate arrow key to move the variable into the corresponding box Dependent Variable assign one
125. ilable and you wish to know which ones are most important e when you do not wish to specify certain covariates as active because you do not wish them to affect the model parameters but you still desire to assess their statistical significance with respect to the classes or a specified subset of the classes e when you wish to develop a separate profile for each latent class see Tutorial 1 A e when you wish to explore differences between 2 or more selected latent classes using a tree modeling structure e when the relationship between the covariates and classes is nonlinear or includes interaction effects or e when you wish to profile order restricted latent classes e For an example of the use of CHAID with Latent GOLD Choice 4 0 see Latent GOLD Choice Tutorial 1A on our website This option is especially useful in the development of simulators as simulators can be easily extended to predict shares not only for each latent class segment but also for CHAID segments defined using relevant exogeneous variables For further information on the CHAID add on option see http www statisticalinnovations com products chaid_v4 html DBMS Copy interface Latent GOLD Choice 4 0 reads SPSS cho and ASCII text files for data input The DBMS Copy interface allows Latent GOLD Choice 4 0 to directly open over 80 additional file formats including Excel SAS and HTML files The full list of file formats is available at http www statisticalin
126. iles in which one or more cases have multiple choice sets assign one variable as a Case ID variable that uniquely identifies a case 93 Choice Set assign one variable to be used to identify each Choice Set The choice set variable may be numeric or character In the 3 file format this variable is used to link the response file to the set file whereas in the 1 file format it indicates which alternatives records belong to the same set Predictors assign one or more variables to be used as predictors Predictors may be either Numeric or Nominal By default variables containing values are treated as Numeric labeled Num Fixed and character variables are treated as Nominal To change the scale type right click on a predictor in the predictor box and select either Numeric or Nominal Predictors are explanatory variables that are constant across alternatives but may vary across sets Covariates assign one or more variables to be used as covariates Covariates represent variables that are descriptive or predictive of the latent variable not of the dependent variable Use of these variables is desirable for describing differences between classes and in reducing classification error Covariate Types Right clicking on any variable included in the covariate box and the following menu pops up listing the different Covariate Types Numeric Nominal Inactive Group Figure 5 Covariate Types Menu Numeric set the covariate s to
127. ility density associated with case i given pa rameter values Y and w is the Case Weight corresponding to case 1 2 This case weight w can be used to group identical response patterns or to specify complex survey sampling weights In the former case w will serve as a frequency count and in the latter case Latent GOLD Choice will provide pseudo ML estimates Patterson Dayton and Graubard 2002 The other type of weight Replication Weight vi that was introduced in the previous section modifies the definition of the relevant probability density P y z The exact form of P y z 0 is described in equation 6 In order to prevent boundary solutions or equivalently to circumvent the problem of non existence of ML estimates we implemented some ideas from Bayesian statistics in Latent GOLD Choice The boundary problem that may occur is that the multinomial probabilities of the model for the latent classes or the model for the choices rankings or ratings may converge to zero This occurs if a 8 or y parameter becomes very extreme tends to go to minus infinity The boundary problem is circumvented by using Dirichlet priors for the latent and the response probabilities Clogg et al 1991 Galindo Garre Vermunt and Bergsma 2004 Gelman et al 1996 Schafer 1997 These are so called conjugate priors since they have same form as the corresponding multinomial probability densities The implication of using priors is that the
128. imation issues and appli cation types The last section discusses the output obtained with the Latent GOLD Choice Advanced options 6 Continuous Factors 6 1 Model Components and Estimation Issues Let Fy denote the score of case on continuous latent variable factor or random effect number d The total number of CFactors is denoted by D thus 1 lt d lt D and the full vector of CFactor scores by F The maximum number of CFactors that can be included in a Latent GOLD Choice model is three thus 0 lt D lt 3 61 Recall that without CFactors the most general Latent GOLD Choice structure for P y z equals K P yilzi X Ple z Plys lx zi g 1 where P y z Zi J P Yil Za a If we include CFactors in a model is assumed structure for P y z becomes P yi zi re x zi P yilz 2i Fi dF 22 where T Py Zi F B II Pla Ze a F t 1 The Fy are assumed to be standard normally distributed and mutually in dependent In other words f F N 0 1 where I is the identity matrix As will be shown below this specification is much less restrictive than one may initially think It is also possible to define models standard random effects conditional logit models containing CFactors but no latent classes x That simplifies the structure for P y z t Ptyila f HE Plyilei Fi dF with P yilZi F Pyiel zi a F 1 Equation 22 shows that the Fy may appear in the
129. independent observational units which is exactly what we want Third the term 2 is only defined if each stratum contains at least two PSUs Latent GOLD Choice solves this problem by skipping strata for which C 1 and by giving a warning that this happens A common solution to this problem is to merge strata The design effect corresponding to a single parameter equals the ratio of its design corrected variance and its variance assuming simple random sampling A multivariate generalization is obtained as follows Skinner Holt and Smith 1989 def f tr Sstandara 0 survey 0 mpar tr E HHB H npar tr B H mpar where tr is the trace operator The generalized design effect is thus the average of the diagonal elements of BH Note that this number equals the average of the eigenvalues of this matrix 8 2 A Two step Method Latent GOLD Choice also implements an alternative two step method for dealing with sampling weights in LC analysis which was described in Ver munt 2002b and Vermunt and Magidson 2001 The procedure involves performing an unweighted analysis followed by a weighted analysis in which the parameters in the model part for the response variables are fixed to their unweighted ML PM estimates More specifically in step two the class sizes and the covariates effects are adjusted for the sampling weights The adjusted log likelihood function that is maximized equals I K log Ladjust
130. inear pre dictors the estimates for error variances and covariances g as well as the corresponding estimated asymptotic standard errors se B se y and se o These standard errors are the squared roots of the diagonal elements of the estimated variance covariance matrix 0 As described earlier one of three methods can be used to obtain 9 yielding either a standard outer product based or robust standard errors and Wald statistics The significance of sets of parameters can be tested by means of the reported Wald statistic labeled Wald We also report a Wald statistic la beled Wald which tests whether regression coefficients are equal between Classes Class Independent The general formula for a Wald statistic W is w c s C EWC Cv where the tested set of linear constraints is C Y9 0 The Wald test is a chi squared test Its number of degrees of freedom equals the number of constraints Computation of standard errors and Wald statistics can be suppressed which may be useful in models with many parameters The Parameters output also contains the means and standard devia tions of the conditional logit coefficients last two columns in model for 52 choices rankings ratings These are the typical fixed and random effects in multilevel mixed or random coefficient logit models Let Bop denote the estimated value of one of the conditional logit parameters which can be a constant an attribute effect or a p
131. ion parameter estimates will be un biased In the NMAR situation however unbiased estimation requires that separate class sizes are estimated for training and non training cases McLachlan and Peel 2000 This can easily be accomplished by expanding the model of interest with a dichotomous covariate that takes on the value 0 for training cases and 1 for non training cases Another application is specifying models with a partially missing dis crete variable that affects one or more response variables An important example is the complier average causal effect CACE model proposed by Imbens and Rubin 1997 which can be used to determine the effect of a treatment conditional on compliance with the treatment Compli ance is however only observed in the treatment group and is missing in the control group This CACE model can be specified as a model in which class membership compliance is known for the treatment group and which a treatment effect is specified only for the compliance class The known class indicator can also be used to specify multiple group LC models Suppose we have a three class model and two groups say males and females A multiple group LC model is obtained by indicating that there are six latent classes were males may belong to classes 1 3 and females to classes 4 6 To get the correct output the grouping variable should not only be used as the known class indicator but also as a nominal covariate 2 10 Zero
132. ion 1 presents a general introduction to the program Section 2 contains several subsections which describe the various components of the model in formal mathematical terms and provide examples of the various coding for the attributes The last subsection 2 10 shows how all these components fit together in terms of the general latent class choice model Section 3 describes the estimation handling of missing data and other technical features Section 4 provides the technical details for all of the output produced by the program Part 3 of the manual is entitled Using Latent GOLD Choice It lists all menus and contains a detailed tutorial which takes you through the use of the program with actual applications The tutorial also illustrates the use of the different data formats In addition to this manual users may wish to refer to the Latent GOLD 4 0 User s Guide many of the details about the basic operation also apply to this program That is in addition to applying to the Cluster DFactor and Regression modules of Latent GOLD 4 0 program they also apply provide more complete operation details that also apply to Latent GOLD Choice 4 0 Part 1 Overview Latent GOLD Choice is available as a stand alone program or as an optional add on module for Latent GOLD 4 0 An optional Advanced Module is also available as well as an optional link to the SI CHAID profiling package Latent GOLD Choice supports the following kinds of latent class choice mod
133. is allowed to differ between groups by using a random effects approach rather than by estimating a separate set of class sizes for each group as is done in a traditional multiple group analysis When adopting a nonparametric random effects approach using GClasses one obtains the following multilevel LC Choice model LK T P y5 2 2 Plz i Ss Pale JI Pyle e x9 1 i 1x 1 t 1 in which the linear predictor in the logistic model for P x x equals Najas Yx9 00 Here we are in fact assuming that the intercept of the model for the latent classes differs across GClasses When adopting a parametric random effects approach GCFactors one obtains T I K a red II PIP TI Pyle 250 ae i la 1 t 1 Piyaz f FER dF Ly where the linear term in the model for P x F equals Ne Fe Yoo t Age FY Note that this specification is the same as in a random intercept model for a nominal dependent variable 21Numeric second order derivatives are computed using the analytical first order deriva tives 69 Vermunt 2005 expanded the above parametric approach with covariates and random slopes yielding a standard random effects multinomial logistic regression model but now for a latent categorical outcome variable With covariates and multiple random effects we obtain I K qe cou red Plyslas f 09 IDO POER FA TI Pju zgi 23500 a j i 1 e 1 t 1 where the linear predictor for x equals a 0 0
134. is number within the best 10 of the start sets For some models many more than 50 iterations per set may need to be performed to avoid local solutions Seed The default value of 0 means that the Seed is obtained during estimation using a pseudo random number generator based on clock time Specifying a non negative integer different from 0 yields the same result each time If the current model setup was obtained by opening an lgf file associated with a previously estimated model 1 the Seed will not be O but will be the Best Start Seed for that model as specified in the gf file and 2 the Random Sets parameter will be set equal to 0 This procedure assures that the model estimated is exactly the same model obtained when originally estimated as long as the lgf file was created using Latent GOLD 4 0 see Warning below To specify a particular numeric seed such as the Best Start Seed reported in the Model Summary Output for a previously estimated model double click the value to highlight it then type in or copy and paste a non negative integer When using the Best Start Seed be sure to deactivate the Random Sets option using Random Sets 0 For further details see section 3 6 of the Technical Guide Warning Due to improvements in this option in Latent GOLD Choice 4 0 the random seed obtained from earlier versions of Latent GOLD Choice will not necessarily reproduce the original model and has an increased chance of resulting in a lo
135. l down to the Prediction Statistics section Figure 33 Prediction Statistics for 1 class Model LatentGOLD oO x File Edit View Model Window Help olf ejej SID r CbcRESP sas 4 Prediction Statistics 1 class L Error Type Baseline 0 Baseline Model R 0 R Parar Squared Error 0 7500 0 7432 0 7095 0 0540 0 0454 Profile Minus Log likelihood 1 3863 1 3730 1 2954 0 0656 0 0566 Probi Absolute Error 1 5000 1 4881 1 4155 0 0564 0 0488 Set Pi Prediction Error 0 7500 0 6974 0 1687 0 1061 Set Pi H 2 class L Prediction Table Estimated 3 class L Observed 1 2 3 4 Total 4 class L 1 646 0 Parar 2 712 0 Profile 3 999 0 Probl 4 843 0 p a Total 400 0 800 0 1600 0 400 0 3200 0 x The prediction table shows that this 1 class aggregate model correctly predicts only 161 of the 646 alt1 responses only 292 of the 712 alt2 responses 603 of the 999 alt3 responses and only 149 of the 843 alt4 responses Overall only 1 205 161 292 603 149 of the total 3 200 observed choices are predicted correctly a hit rate of only 37 66 This represents a Prediction Error of 1 3766 6234 as reported in the row of the output labeled Prediction Error For comparison we will look at the corresponding statistics under the 3 class model gt Click on 3 class and sc
136. lar term the intercept or a covariate effect one obtains a separate set of coefficients for each GClass GCFactors enter as random effects in the regression model for the discrete latent variable s In models with GClasses the parameters output contains the coefficients of the multinomial logistic regression Model for GClasses The reported Class specific Rijs measures are obtained by averaging the predicted values over the other latent variables included in the model This is the reason that in a one Class model the Class specific Ry may be lower that the overall R3 When the Survey option is used one obtains design corrected standard errors and Wald statistics In models with CFactors one obtains an output subsection called Random Effects This subsection provides the CFactor effects A and the variance covariance matrix of the random effects Ny A A 9 3 GProfile The first part of this output section reports the sizes of the GClasses P x9 and the probability of being in a certain latent class for each GClass P x9 The second part of the GProfile section reports the GClass specific proba bilities for the choice variable The computation of this part of the GProfile output is similar to the computation of the same kinds of numbers in the Profile output 78 9 4 ProbMeans In models with CFactors the Probmeans output reports the average CFactor posterior mean for each covariate category 9 5 Frequencies Fr
137. le Set Probmeans Iteration De tail Frequencies Standard Classification and Covariate Classification as well as on the output that can be written to files Standard Classification Covariate Classification Predicted values Individual Coefficients Cook s D and Variance Covariance Matrix 4 1 Model Summary This first part of the output section reports the number of cases N ZEL w the total number of replications Nep XLi wi 1 vit the num ber of estimated parameters npar the number of activated constraints in models with order restrictions the seed used by the pseudo random number generator the seed of the best start set and the seed used by the bootstrap procedure 43 The last part Variable Detail contains information on the variables that are used in the analysis The other four parts Chi squared Statistics Log likelihood Statistics Classification Statistics Covariate Classification Statis tics and Prediction Statistics are described in more detail below 4 1 1 Chi squared statistics The program reports chi squared and related statistics except when the data file contains replication weights other than 0 or 1 The three reported chi squared measures are the likelihood ratio chi squared statistic L the Pear son chi squared statistic X and the Cressie Read chi squared statistic C R Before giving the definitions of the chi squared statistics we need to explain two types of groupings that ha
138. ll E lterations 60 O Seed fo Replications 600 Tolerance te 005 Seed jo Continuous Factors Number of Nodes 10 X Restore to Defaults Save as Default Cancel Changes Cancel Estimate Help Figure 10 Technical Tab Convergence Limits EM Tolerance EM Tolerance is the sum of absolute relative changes of parameter values in a single iteration It determines when the program switches from EM to Newton Raphson if the NR iteration limit has been set to gt 0 Increasing the EM Tolerance will switch faster from EM to NR To change this option double click the value to highlight it then type in a new value You may enter any non negative real number The default is 0 01 Values between 0 01 and 0 1 1 and 10 are reasonable Tolerance Tolerance is the sum of absolute relative changes of parameter values in a single iteration It determines when the program stops its iteration The default is 1 0x10 8 which specifies a tight convergence criterion To change this option double click the value to highlight it then type in a new value You may enter any non negative real number Iteration Limits EM Iterations Maximum number of EM iterations before switching to Newton Raphson if NR iteration is not equal to 0 The default is 250 If the model is estimated using EM only if you set NR iterations 0 and it does not converge after 250 iterations this value should be increased You also may want to increase t
139. log posterior is negligible i e smaller than 107 The program reports the iteration process in Iteration Detail Thus it can easily be checked whether the maximum number of iterations is reached without convergence In addition a warning is given if one of the elements of the gradient is larger than 107 It should be noted that sometimes it is more efficient to use only the EM algorithm which is accomplished by setting Iteration Limits Newton Raphson 0 in the Technical Tab This is for instance the case in models with many parameters With very large models one may also consider sup pressing the computation of standard errors and Wald statistics or to Pause the model estimation to examine preliminary output 38 3 6 Start Values Latent GOLD Choice generates random start values So long as the technical option Seed equals 0 the default option these differ every time that a model is estimated because the seed of the random number generator is then obtained from the system time The seed used by the program is reported in the output A run can be replicated by specifying the reported best start seed as Seed in the Technical Tab and setting the number of Random Sets to zero Since the EM algorithm is extremely stable the use of random starting values is generally good enough to obtain a converged solution However there is no guarantee that such a solution is also the global PM or ML solution A well known problem in LC analy
140. mall number In some situations one may desire removing certain cases from the analysis but nevertheless obtaining classification and prediction output for all cases This can be accomplished by using case weights equal to a very small number i e 1 0e 100 for the cases that should not be used for parameter esti mation The program treats such a weight as if it were a zero which means that results are not influenced by the presence of these cases and that com putation time is comparable to the analysis of a data set without these cases An important difference with the zero case weight option is that this very small case weight option yields classification and prediction information for the cases concerned One possible application is the analysis of very large data sets With this option one can use a subset of cases for parameter estimation but obtain class membership information for all cases Another application is predicting class membership for new cases based on parameter values obtained with another sample By appending the new cases to the original data file and giving them a weight equal to 1 0e 100 one obtains the relevant output for these cases after restoring and re estimating the original model 4 The Latent Gold Choice Output Below we provide technical details on the quantities presented in the var ious Latent GOLD Choice output sections Model Summary Parameters Importance Profile ProbMeans Set Profi
141. milar task is the distribution of say 100 chips or coins among the alternatives or a task with the instruction to indicate how many out of 10 visits of a store one would purchase each of several products presented Other applications of the replication weights include grouping and differ ential weighting of choices Grouping may be relevant if the same choice sets are offered several times to each observational unit Differential weighting may be desirable when analyzing ranking data In this case the first choice may be given a larger weight in the estimation of the utilities than subse quent choices It is even possible to ask respondents to provide weights say between 0 and 1 to indicate how certain they are about their choices In the simultaneous analysis of stated and revealed preference data it is quite common that several stated preferences are combined with a single revealed preference In such a case one may consider assigning a higher weight to the single revealed preference replication to make sure that both preference types have similar effects on the parameter estimates 2 6 Other Choice Preference Formats In the previous sections we showed how to deal with most of the response for mats mentioned in the introduction To summarize first choice is the basic format rankings are dealt with as sequences of first choices with impossi ble alternatives ratings are modelled by an ordinal logit model best worst choices can be
142. ming Estimation of a Paused Model Model Menu After viewing the preliminary output and making changes to the options as described above to Resume the estimation of a paused model gt select Resume from the Model Menu or gt Click the Resume button at the bottom of the Analysis Dialog Box associated with the Paused model gt To open the analysis dialog box for a Paused model double click the name of the paused model In the Analysis Dialog box for a Paused model the word Resume replaces the word Estimate on the Estimate button or you may gt Click on the name of the Paused Model and select Resume from the Model Menu After constructing the various tables needed to initialize the estimation algorithm the red Cancel light is illuminated and the estimation algorithm begins To Cancel the estimation prior to completion click the Red Cancel button To estimate a new model simply double click on the last model name and click Estimate button STEP 4 Viewing Output Output Options Once the model has been estimated the left hand column Outline Pane will display a number of output options They are 116 Parameters Importance Profile ProbMeans Set Profile Set ProbMeans LatentGOLD File Edit View Model Y sla gt e ala brandsAB sav Modell L 1784 8004 Parameters Importance Profile ProbMeans Set Profile Set ProbMeans Model2 Model3 E Figu
143. model for the choices but not in the model for the latent classes Compared to models without CFactors the linear predictor in the model for the choices is the expanded with the following additional term D Q Y md Fa Y YA pd Pops A a ae ee d 1p 1 d 1 q 1 II l7There is a trick for including CFactor effects in the model for the latent classes using the multilevel option 62 In the first term the Fy define random effects for the alternative specific constants and the Fa do and Fy h product terms define random coefficients for the attributes and predictors An important difference with the more standard specification of random effects models is that here each F can serve as a random effect for each of the model effects which as will be shown below yields parsimonious random effects covariance structures Another important difference is that the size of the parameters associated with the random effects may differ across latent classes Model restrictions One can use the parameter constraints Class Inde pendent No Effect and Merge Effects which imply equal s among all Classes zero A s in selected Classes and equal A s in selected Classes respectively ML PM estimation and technical settings The main complication in the ML PM estimation of models with CFactors is that we have to deal with the multidimensional integral appearing in the definition of the marginal density P y z
144. mple to obtain standard errors of probabilities and redundant parameters Inequality restrictions needed for ordered clusters order restricted pre dictor effects and positive variances are dealt with using an active set variant of the Newton Raphson method described above Galindo Garre Vermunt Croon 2001 Gill Murray and Wright 1981 For that purpose the effects involved in the order constraints are reparameterized so that they can be imposed using simple nonnegativity constraints of the form Y gt 0 In an active set method the equality constraint associated with an inequality constraint becomes activate if it is violated here parameter is equated to 0 if it would otherwise become negative but remains inactive if its update yields an admissible value here a positive update 3 5 Convergence The exact algorithm implemented in Latent GOLD Choice works as follows The program starts with EM until either the maximum number of EM itera tions Iteration Limits EM or the EM convergence criterion EM Tolerance is reached Then the program switches to NR iterations which stop when the maximum number of NR iterations Iteration Limits Newton Raphson or the overall converge criterion Tolerance is reached The convergence criterion that is used is npar Qu Qu 1 y e A gt u 1 yu which is the sum of the absolute relative changes in the parameters The program also stops its iterations when the change in the
145. mples Sociological Methods and Research 33 88 117 Gill P E Murray W and Wright M H 1981 Practical optimization London Academic Press Gelman Andrew Carlin John B Stern Hal S and Robin Donald B 1995 Bayesian data analysis London Chapman amp Hall Goodman L A 1974a The analysis of systems of qualitative variables when some of the variables are unobservable Part I A modified latent structure approach American Journal of Sociology 79 1179 1259 82 Goodman L A 1974b Exploratory latent structure analysis using both iden tifiable and unidentifiable models Biometrika 61 215 231 Goodman L A 1979 Simple models for the analysis of association in cross classifications having ordered categories Journal of the American Statistical Association 74 537 552 Haberman S J 1988 A stabilized Newton Raphson algorithm for log linear models for frequency tables derived by indirect observations C Clogg ed Sociological Methodology 1988 193 211 San Francisco Jossey Bass Hedeker D 2003 A mixed effects multinomial logistic regression model Statis tics in Medicine 22 1433 1446 Im S and Gionala D 1988 Mixed models for binomial data with an appli cation to lamb mortality Applied Statistics 37 196 204 Imbens G W and Rubin D B 1997 Estimating outcome distributions for compliers in instrumental variable models Review of Economic Studies 64 555 574 Kamakura W A
146. n This option requires the SI CHAID 4 0 program The CHAID CHi squared Automatic Interaction Detector analysis op tion can be used to assess the statistical significance of each Covariate in its relationship to the latent classes as well as to develop detailed profiles of these classes based on the relationships in 3 and higher way tables For ex ample in tutorial 6A a CHAID analysis is used to explore the relationship between an individual s banking usage during some period number of checks written ATM usage average balance and the latent classes obtained in tu torial 6 If this option is selected at the conclusion of the Latent GOLD Choice run a CHAID chd file is created which can be used as input to the SI CHAID 4 0 program 59 Part II Advanced Model Options Technical Settings and Output Sections 5 Introduction to Part II Advanced Models This part of the manual describes the three Advanced options of Latent GOLD Choice 4 0 These are 1 An option for specifying models with continuous latent variables which are referred to as continuous factors CFactors 2 A multilevel extension of the LC Choice model which is a model con taining group level continuous latent variables GCFactors and or a group level nominal latent variable GClasses 3 An option to deal with the sampling design which yields correct sta tistical tests for complex survey sampling designs that deviate from simple random sampling
147. n the latent classes as A2 0 lt 1 lt R where the superscript 0 refers to the model for the latent classes 67 GClasses enter in the conditional logit model for the choices as Bis Dado et a Beas gt zh Inclusion of GClasses in the model for the Classes implies da the y parameters become GClass dependent that is Mola 09 Yas x0 y Yes r Zj Note that this is similar to a LC Regression analysis where x2 now plays the role of x and x the role of a nominal y variable The remaining linear predictor is the one appearing in the multinomial eee regression model for the GClasses It has the form 0129 Vaso D Wor 25 This linear predictor is similar to the one for the Classes in a standard LC model showing that GCovariates may be allowed to affect GClasses in the same way that covariates may affect Classes Below we will describe the most relevant special cases of this very general latent variable model most of which were described in Vermunt 2002b 2003 2004 and 2005 and Vermunt and Magidson 2005 We then provide some expressions for the exact forms of the various linear predictors in models with GClasses GCFactors and GCovariates Model restrictions One can use the parameter constraints Class Inde pendent No Effect and Merge Effects implying equal s 8 s among all Classes zero A s 8 s in selected Classes and equal A s 3 s in selected Classes ML PM estimation and te
148. nding model estimated using the Latent GOLD Regression module where the dependent variable scale type is set to Nominal For a rating this yield a model equivalent to an ordinal regression model Alternative specific constants are entered into the model by double clicking on the special variable named _Constants_ that shows up automatically in the Attributes tab 90 When the 3 file format is used a response file is always required The Alternatives and Choice set files may be omitted if only the alternative specific constants _Constants_ are included in the model Otherwise effects for attributes defined in the Alternatives file may be included in the model in addition to or in place of the alternative specific constants After opening the Response File the remaining files are opened in the Attributes Tab where the attributes effects to be included in the model are defined Several examples of Alternatives and Sets files are given in Tutorials 1 and 2 More specific information is given below Alternatives file Each record row in this file defines a distinct choice alternative in terms of one or more attributes A unique label identifies each of these alternatives In figure 1 below the unique identification label is named alt_id The other 3 variables shown correspond to attributes describing the various alternatives BRAND PRICE and NOBUY Note that the value of the attributes BRAND and 66 99 PRICE ar
149. ng the SI CHAID 4 0 option Figure 44 Covariate Classification Output File Edit View Model Window Help ll Slel ela Arie Parameters Modal Class1 Class Class3 Profile Male 25 39 3 0 0690 0 0962 0 8348 ProbMeans Female 1 E Set Profile Male 40 3 0 1324 0 3601 0 5075 Set ProbMeans A Female 40 2 04114 05419 0 0467 Standard Classificati Male 16 24 1 0 5911 0 1868 0 2221 Covariate Classificati Female 25 39 1 0 4917 0 3322 0 1761 EZ Latent GOLD Choice was found to be successful in uncovering the 3 segment structure in this example In tutorial 2 we will use our final model to generate choice predictions 148 To save the analysis file for future work gt Click on the data file name cbcRESP sav at the top of the outline pane gt From the File Menu select Save Definitions gt Click Save 149
150. novations com products latentgold_S0formats html Acknowledgments We wish to thank the following people for supplying data John Wurst SDR Research Wagner Kamakura Duke University Bryan Orme Sawtooth Software We wish to that the following people for their helpful comments Tom Eagle Eagle Analytics Steve Cohen Cohen Stratford and Bengt Walerud KW Partners We also wish to thank Michael Denisenko for assistance on this manual and Alexander Ahlstrom for programming Technical Guide for Latent GOLD Choice 4 0 Basic and Advanced Jeroen K Vermunt and Jay Magidson Statistical Innovations Inc 617 489 4490 http www statisticalinnovations com This document should be cited as J K Vermunt and J Magidson 2005 Techni cal Guide for Latent GOLD Choice 4 0 Basic and Advanced Belmont Massachusetts Statistical Innovations Inc Contents 1 Introduction to Part 1 Basic Models 2 The Latent Class Model for Choice Data E II 2 2 Rankings and Other Situations with Impossible Alternatives Dee LAS rar AE AAA A 2 4 Replication Scale and Best Worst Choices 2 5 Replication Weight and Constant sum Data 2 6 Other Choice Preference Formats Dr IIA ce ek a cal e a ewe Oe OM ds 2 8 Coding of Nominal Variables 2 9 Known Class Indicator 2 10 Zero Inflated Models eee 2 11 Restrictions on the Regressi
151. nstraint implies that the corresponding 4 effects should sum to 0 This is accomplished by defining a design matrix with 3 numeric attributes 2 n 228 and 244 The design matrix that is set up 21 att att att at Bas Bos is as follows for the 3 non redundant terms category 1 1 0 0 category 2 0 1 0 category3 0 0 1 category 4 1 1 1 where each row corresponds to a category of the attribute concerned and each column to one of the three parameters Although the parameter for the last category is omitted from model you do not notice that because it is computed by the program after the model is estimated The parameter for the fourth category equals a per that is minus the sum of the parameters of the three other categories This guarantees that the parameters sum to zero since a ni Doai o 0 Instead of using effect coding it is also possible to use dummy coding Depending on whether one uses the first or the last category as reference category the design matrix will look like this category 1 0 0 0 category 2 1 0 0 category 3 0 1 0 category 4 0 0 1 or this category 1 1 0 0 category 2 0 1 0 category 3 0 0 1 category 4 0 0 0 Whereas in effect coding the category specific effects should be interpreted in terms of deviation from the average in dummy coding their interpretation is in terms of difference from the reference category Note that the parameter for the reference category is omitted w
152. numeric Nominal set the covariate s to nominal Active the default setting Inactive set covariates to be inactive Group set covariates to apply to group level Gclasses Active covariates will generally effect the definition of the classes and the part worth and other parameters Advantages of working with inactive instead of active covariates are that the estimation time is not increased and that the obtained solution is the same as without covariates i e inactive covariates do not influence the parameter estimates The box labeled Classes is located beneath the Covariates button Enter a number greater than 0 Separate models will be estimated for each segment If a range is specified for the number of classes such as 1 4 separate sets of models will be estimated the first representing a 1 class model the traditional aggregate model which assumes a single homogeneous population Replication Scale assign one variable to be used as Replication Scale Replication Weight assign one variable to be used as Replication Weight Case Weight Variable assign one variable to be used as Case Weight SCAN Scans the data file s to obtain all values and labels for the variables Once you have finished in the Variables Tab you may wish to click the Scan button to check to see that the variables are being read correctly After scanning the file you may double click on the Dependent or any Predictor or Covariate to view the categ
153. odell x Variables Attributes Advanced Model ClassPred Output Technical Survey Stratum gt ss PSUS Pe Sampling Wgt gt ee Population Size gt fC O Continuous Factors CFactors None Multilevel Model Group ID gt GClasses GCFactors None F Lexical Order Scan Cancel Estimate Help Figure 7 Advanced Tab The variables displayed in the variable list left most box of the Advanced Tab are those that have not been specified previously for use previously as an attribute dependent variable predictor covariate known class indicator case ID case weight or replication weight These variables are eligible for use with any of these 3 advanced options Survey This advanced option can be used to specify information on the sampling design that was used to obtain your data The program computes the design effect as well as reports sampling design corrected standard errors and Wald statistics Four aspects of the sampling design can be taken into account stratification Stratum clustering PSU weighting Sampling Wgt and finite population size Population Size For more details see section 8 of the Technical Guide Stratum The Stratum variable specifies the stratum to which a case belongs When no Stratum variable is specified it is assumed that all cases belong to the same stratum that is that there is only one stratum PSU The
154. on Coefficients 2 12 General Latent Class Choice Model 3 Estimation and Other Technical Issues 3 1 Log likelihood and Log posterior Function Oa Mesing Dala ca cir ata bw eS EDS Od EOE ADA 3 2 1 Dependent variable 24 524 244464844445 3 2 2 Attributes predictors and covariates ds Prior Distributions ic ws ee tde ee a A 6 54 i eo wh EMER DADE ODE Se KES ce oo eee eee EERE OS Re RE ES Oo Se AI 3 7 Bootstrapping the P Value of L or 2LL Difference 36 Identification Issues 4 4 4 4 445458 sanca RRR EA 3 9 Selecting and Holding out Choices or Cases 3 9 1 Replication and case weights equal to zero 2 3 9 2 Replication weights equal to a very small number 3 9 3 Case weights equal to a very small number 4 The Latent Gold Choice Output BL Model SOLIS lt a AA 4 1 1 Chi squared statistics o o 4 1 2 Log likelihood statistics lt lt lt 11 11 13 15 16 17 18 20 21 23 24 25 30 30 30 32 32 32 34 39 38 39 40 41 42 42 42 43 4 1 3 Classification statistics 2 229 000 ees 4 1 4 Covariate classification statistics 4 1 5 Prediction statistics 0 0 2 00084 4 2 Parameters 0 0 0 0 a a do Te cce eR ona ee ee ee Eee Ee ORs 4 4 Profile and ProbMeans 0 000004 4 5 Set Profile and Set ProbMeans 4 6 Frequencies f Residuals gt 2 5 25 se ee
155. on such a modal assignment For ratings which are ordinal dependent variables we make use of the mean Ji in some of the error measures Error measures may also be based on the estimated probabilities instead of a single predicted value The error measures reported in prediction statistics are obtained as fol lows a Wi 2a vie Error jt Te Wi D Vit As can be seen Error is a weighted average of the replication specific errors Error Latent GOLD Choice uses four types of error measures Squared Error Absolute Error Minus Log likelihood and Prediction Error which differ in the definition of Error For ratings the Error for Squared Error and Absolute Error equal Yi Gx and lyi Jil respectively For choices and ranking these errors equal S Em yet Pratl and pier Im Yit Pratl where indicator variable 1 y equals 1 if yj m and 0 otherwise The Error for Minus Log likelihood equals D4_ Im yi In Pig In the computation of Prediction Error Error equals O if the modal prediction is correct and 1 otherwise The general definition of the pseudo R of an estimated model is the reduction of errors compared to the errors of a baseline model More precisely 19 Error R Error baseline Error model y Error baseline Latent GOLD Choice uses two different baseline models called Baseline and Baseline 0 yielding two R measures called R and R 0 In Baseline the Error i
156. on the desired interpretation of the classes In addition this option may also be used to specify multiple group models by including the group variable as both a Known Class Indicator and as an active covariate For further details of this see section 2 5 of the Latent GOLD Technical Guide To select known classes gt Select one variable from those appearing in Variables List Box located in the upper left hand portion of the ClassPred Tab Variables appearing here are those that have not been previously selected gt Click Known Class to move that variable to the Known Class Box and the class assignment window beneath the Known Class Box becomes active gt A separate row appears for each category code value taken on by the known class indicator A separate column appears for each class gt Click on the appropriate boxes to select or deselect the possible assignment of the categories to certain classes A checkmark off means that the posterior membership probability is restricted to zero for that class for cases in that category of the known class indicator By default the checks are assigned as follows For a K class model a category with a code of K on the Known Class Indicator is assigned to only class K Categories coded less than 1 greater than K or missing are assigned to all classes i e no restrictions Missing values are not shown in the table Note For the example in Figure 9 above all cases are coded either 1
157. oncomitant or external variables Clogg 1981 Dayton and McReady 1988 Kamakura et al 1994 Van der Heijden et al 1996 When covariates are included in the model the probability structure changes slightly compared to equation 3 It becomes K P yi z 5 P x z PlYilx ag 4 xa 1 1 at ot ll As can be seen class membership of individual 7 is now assumed to depend on a set of covariates denoted by z A multinomial logit is specified in which class membership is regressed on covariates that is eXP Nz z P x z DF a1 xP Ne 2 with linear term E Najz Yoo 5 Yre ir 5 r 1 Here Yox denotes the intercept or constant corresponding to latent class x and yr is the effect of the rth covariate for Class x Similarly to the model for the choices for identification we either set S es 0 a U or 20 YK 0 for 0 lt r lt R which amounts to using either effect or dummy coding Although in equation 5 the covariates are assumed to be numeric the program can also deal with nominal covariates see subsection 2 8 We call this procedure for including covariates in a model the active covariates method Covariates are active in the sense that the LC choice solution with covariates can be somewhat different from the solution with out covariates An alternative method called inactive covariates method involves computing descriptive measures for the association between covari
158. or is equivalent to adding cases to each latent class These cases are distributed evenly over the various covariate patterns This prior makes the sizes of the latent classes slightly more equal and the covariate effects somewhat smaller For the dependent variable we use the following Dirichlet prior att pred Tm Q2 log p Pty mlz Zu Li K att pred log P y x 20 2b where m is the observed marginal distribution of the dependent variable y This prior can be interpreted as adding observations to each latent class with preservation of the observed distribution of y where a is a parameter to be specified by the user The 2 observations are distributed evenly over the observed attribute predictor patterns This prior makes the class specific 34 response probabilities slightly more similar to each other and smooths the 8 parameters somewhat towards zero The influence of the priors on the final parameter estimates depends on the values chosen for the a s as well as on the sample size The default settings are a a2 1 0 This means that with moderate sample sizes the influence of the priors on the parameter estimates is negligible Setting a Q2 0 yields ML estimates 3 4 Algorithms To find the ML or PM estimates for the model parameters Y Latent GOLD Choice uses both the EM and the Newton Raphson algorithm In practice the estimation process starts with a number of EM iterations When close enough to
159. or rankings B ckenholt 2002 Croon 1989 and best worst choices Cohen 2003 For example a pair of best and worst choices can also be seen as a joint choice out of M M 1 joint alternatives The attribute values of these joint alternatives are equal to the attribute values of the best minus the attributes of the worst What is clear from these examples is that setting up a model for a joint choice can be quite complicated Another example of a situation in which one has to set up a model for a joint response variable is in capture recapture studies Agresti 2002 For the subjects that are captured at least ones one has information on capture at the various occasions The total number of categories of the joint dependent variable is 27 1 where T is the number of time point or replications Note that these examples of joint choice models all share the fact that the number of possible joint alternatives is smaller than the product of the number of alternatives of the separate choices That is in each case certain combinations of choices are impossible and hence the model of interest can not be set up as a series of independent choices Instead these situations should be specified as a single joint choice The last choice format we would like to mention is the combination of different response formats The most general model is the model for first choices Models for rankings and ratings are special cases that are obtained
160. ories values 94 GROUPS Groups adjacent categories values for the Dependent or any Predictor or Covariate to reduce the total number of levels For numeric variables the value for a group is the mean of the values within that group To implement the grouping double click on a variable to open the SCORE window At the bottom right next to the GROUPS label enter a number for the desired number of groups and click the Group button The number of groups will become the desired number or fewer groups USER SCORES Uniform Scores Re codes the scores assigned to numeric variables to other values To implement double click on a variable to open the SCORE window Double click on any category value for that variable In the bottom left box replace the current score by the desired new score and click the Replace button To change scores to equi distant integers click the Uniform button Click OK when finished After completing the variable assignments in the Variables tab open the Attributes tab and continue the model setup In the Attributes tab you can Open Alternatives and Sets Files and specify the attributes and scale types for these attributes Attributes Tab Choice Model BrandsAB sav Modell x Variables Attributes Advanced Model ClassPred Output Technical Constants Attributes gt I Lexical Order Alternatives saefauo Total Alternatives fi Alternative ID 10 Ghoice Bets Total Choice
161. other types of three level GLM regression models with parametric random effects Im and Gionala 1988 Skrondal and Rabe Hesketh 2004 Rodriguez and Goldman 2001 Vermunt 2002c 2004 In terms of probability structure this yields Piy ff Ij Ti red a I f 150 POl 2st Pie PAR dF 71 The simplest special case is obtained by assuming that the conditional logit model contains random intercepts at both the case and the group level The corresponding linear predictor in a model with P attributes equals P con att con o cong mI Nml zji Fiji Pl Dm 2 Pep Zmjitp Ami i Fiji ds Ami Fij p 1 Such a model containing a single CFactor and a single GCFactor will suffice in most three level random effects applications However similar to the random effects models discussed in the context of the CFactors option this model can be expanded with random slopes at both levels using the factor analytic or generalized random effects specification illustrated in equation 24 7 2 4 LC growth models for multiple response Suppose one has a longitudinal data set containing multiple responses for each time point The multiple responses could be used to build a time specific latent classification while the pattern of latent change over time could be described using a LC growth model Specification of such a model would involve using the index for the time points and the index j for the cases time points are nested within cases
162. output files appears in the Outline pane Figure 25 The two Panes LatentGOLD iol x File Edit View Model Window Help ejaj e a gt E CocRESP saw Modell Outline Pane Contents Pane Table of CcbcSET sav completed after 8 records 134 The Outline pane contains the name of the data file and a list of any previously estimated models and their output previously estimated The Contents pane currently empty is where you will view the various types of output Viewing Output and Interpreting Results gt Highlight the data file name cbcRESP sav and a summary of all the models estimated on this file appears in the Contents pane CAT lolxi File Edit View Model Window Help Su m e ela gt el e LL BIC LL Npar Class Err R30 R Model1 Model 1 Class Choice 4145 1973 8320 3519 5 4905 0533 393205 1 00 0 0000 0 0540 0 0454 Model2 Model2 2 Class Choice 3704 6949 7493 2703 14 4024 0485 393196 1 00 0 0303 0 2064 0 1988 Model3 y Model3 3 Class Choice 3648 6719 7435 1474 23 3912 0025 393187 1 00 0 0707 0 2231 0 2157 E a Model4 4 Class Choice SERRE 7473 2238 32 3896 1556 393178 1 00 0 0810 0 2281 0 2207 Figure 26 Model Summary Output Note For the 4 class model you may obtain a local solution LL 3644 3868 rather than the global solution reported above LL 3640 7484 If this occurs re restimate the 4
163. pear which can be used to modify the appearance of the Plot in an interactive manner A right click in the Parameters Output retrieves a pop up menu containing the Options from the View Menu which allow you to change the appearance of the output in various ways such as adding a column for standard errors 123 Tutorial 1 Using Latent GOLD choice to Estimate Discrete Choice Models In this tutorial we analyze data from a simple choice based conjoint CBC experiment designed to estimate market shares choice shares for shoes In this tutorial you will Set up an analysis Estimate choice models that specify different numbers of classes segments Explore which of these models provides the best fit to the data Utilize restrictions to refine the best fitting model Interpret results using our final model Save results In tutorial 2 the final model will be used to e Predict future choices e Simulate choices among additional products of interest The Data Latent GOLD Choice accepts data from an optional 1 file or its default 3 file structure from an SPSS sav file a Sawtooth cho file or ASCII rectangular file format The current sample data utilizes 3 SPSS sav files An additional 80 file formats are available using the DBMS Copy add on The 10 pairs of shoes included in this choice experiment differ on 3 attributes Fashion O Traditional 1 Modern Quality 0 Standard 1 Higher and Price 5 equidistant levels coded from
164. pies of the price variable in the model say Pricel and Price2 The effect of Pricel is specified as ordered and is fixed to zero in Class 4 The effect of Price2 is fixed to zero in Classes 1 3 Suppose your assumption is that the effect of a particular attribute is at least 2 This can be accomplished by combining a fixed value constraint with an order constraint More precisely an additional attribute defined as 2 Zitmp is specified to be an offset and the effect of the original attribute zip defined to be ascending Our final example is an exploratory variant of the DFactor structure de scribed above Suppose you want a two DFactor model without assumptions on which discrete factor influences which attribute effects This can be ac complished having 3 copies of all attributes in the attributes file With two attributes brand and price the restriction table is of the form Class 1 Class 2 Class 3 Class 4 Brandl 1 1 1 1 Brand2 3 3 Brand3 2 2 Pricel 1 1 1 1 Price2 3 3 Price3 2 2 The first copy Brand1 and Pricel defines a main effect for each attribute The second copy Brand2 and Price2 is used to define the first DFactor a contrast between Classes 3 4 and 1 2 The third copy Brand3 and Price3 specifies DFactor 2 by means of a contrast between Classes 2 4 and 1 3 29 2 12 General Latent Class Choice Model In the previous subsections we described the various elements of the LC model implemen
165. r Model with a Known Class Indicator This option is useful if you have a priori class membership information for some cases pre assigned or pre classified cases or if membership to certain classes is very implausible for some combinations of observed scores Known Class Class Indicator In applications where a subset of the cases are known with certainty not to belong to a particular class or particular classes you can take advantage of this information to restrict their posterior membership probability to 0 for one or more classes and hence classify these cases into one of the remaining class es with a total probability 1 This feature allows more control over the segment definitions to ensure that the resulting classes are most meaningful Common applications include 1 using new data to refine old segmentation models while maintaining the segment classifications of the original sample 2 archetypal analysis define class membership a priori based on extreme choice response patterns that reflect theoretical archetypes 102 3 partial classification high cost or other factors may preclude all but a small sample of cases from being classified with certainty These cases can be assigned to their respective classes with 100 certainty and the remaining would be classified by the LC model in the usual way 4 post hoc refinement of class assignment where modal assignment for certain cases is judged to be implausible based
166. r the segment definitions by pre assigning selected cases not to be in a particular class or classes Known Class option Restricting Cases Known Not to Belong to a Certain Class or Classes With this option one can specify that one or more specific cases can belong to a certain class or certain classes only To use this feature select a variable from the list box in the ClassPred Tab to be used as the Known Class Indicator and click Known Class The variable moves to the Known Class Indicator Box and the Assignment Table becomes active For each category of the Known Class indicator you then specify to 101 which classes the cases with that category code may belong or not belong using the Assignment Table For example Figure 9 illustrates a 4 Cluster model 4 columns where the variable classind is used as the Known Class Indicator Cases for which classind 1 are allowed to be in cluster 1 only those for which classind 2 are allowed to be in cluster 2 only all other cases classind 3 may be assigned to any of the 4 clusters Cluster Model classind500 sim OR xj Variables Advanced Model Residuals ClassPred Output Technical Known Class gt lassind f I Lexical Order Standard Classification I Covariate Classification 7 Predicted Values a Individual Coefficients I Cook s D Y _ _ _ Browse J CHAID Browse Figure 9 ClassPred Tab for Cluste
167. rages of the class specific effects where the posterior membership probabilities of a case serve as weights In the output file the coefficients appear in the same order as in the Parameters Output and are labeled as b1 b2 b3 etc Both the individual estimates est_b1 est_b2 etc and the individual standard deviations std_b1 std_b2 etc are provided For more information see Section 4 1 5 of the Technical Guide Cook s D The Cook s Distance measure may be output to an external file This measure is used to identify cases that have a large influence on the parameter estimates A recommended cut off point for Cook s distance is four times the number of parameters divided by the number of observations CHAID Requires a license to SI CHAID 4 0 This option creates a CHAID settings file chd file from your model that can be then opened via the SI CHAID 4 0 program With this option a CHAID CHi squared Automatic Interaction Detector analysis may be performed following the estimation of any LC model in Latent GOLD 4 0 By selecting CHAID as one of the output options a CHAID input file will be constructed upon completion of the model estimation which can then be used as input to SI CHAID 4 0 For more information regarding CHAID see page 2 For additional information on the output to file options see Sections 4 8 and 4 9 of the Technical Guide Step 3 Estimate the Model Upon completing the model setup click Estimate and t
168. re 13 Output Options Parameters Output For any estimated model click Parameters and Latent GOLD displays a table containing parameter estimates and measures of significance for these estimates For a detailed explanation of these parameters see Section 4 2 of the Technical Guide Viewing Wald Statistics and Standard Errors By default Wald statistics are provided in the output to assess the statistical significance of the set of parameter estimates associated with a given variable across all classes Specifically for each variable the Wald statistic tests the restriction that each of the parameter estimates in that set equals zero for variables specified as Nominal the set includes parameters for each category of the variable Two Wald statistics Wald Wald are provided in the table when more than 1 class has been estimated For each set of parameter estimates the Wald statistic considers the subset associated with each class and tests the restriction that each parameter in that subset equals the corresponding parameter in the subsets associated with each of the other classes That is the Wald statistic tests the equality of each set of regression effects across classes e To view standard errors or Z statistics associated with parameter estimates from the menus choose View Standard Errors or Z statistic 117 and the column containing the Wald statistic s is replaced by the standard errors Z statistics Stan
169. redictor effect The posterior mean or expected a posteriori estimate of a particular regression coefficient for case 7 is defined as follows K Bip 5 P x z yi Pep 20 xa 1 that is as a weighted average of the Class specific coefficients These esti mates are similar to the individual coefficients obtained in multilevel mixed random effects or hierarchical Bayes HB models The person specific co efficients can be used to predict person 2 s new choices The person specific coefficients can be used to predict person s new responses The posterior standard deviations are defined as K E e E Another output to file item is Cook s D Cook s Distance It can be used to detect influential cases or more precisely cases with a large influence on the parameter estimates The formula that is used the following Ci 2 g H gi 21 where H is the Hessian matrix and g the vector with the gradient contri butions of case 7 A typical cut point for Cook s D is four times the number of parameters divided by the number of cases Skrondal and Rabe Hesketh 2004 98 The last output to file item is the Variance Covariance Matrix of the model parameters Dependent of the type of variance estimator that is re quested this will be Setandara O Souter 0 or S rorusi 0 Note that also the variances and covariances involving the omitted categories of the effect coded nominal variables are reported 4 9 The CHAID Output Optio
170. redictor effect Using basic statistics cal culus the Mean of Be can be defined as D P r oe and the Std Dev of ce as po P x Ga z bam P x Bap 4 3 Importance The Importance output reports the maximum effect for each of the attributes including the constants as well as re scaled maximum effects that add up to one within latent classes Let a denote a level of attribute p A its total number of levels and ajep the utility associated with level a for latent class x For numeric attributes Nalxp equals the attribute effect times the numeric score of category a Cie for nominal attributes it is simply the effect for category a E The maximum effect of attribute p for latent class x is defined as maxeft max falzp min Majzp These maximum effects can be compared both across attributes and across latent classes Often it is relevant to compare the relative importances of the attributes across latent classes These relative importances or relative maximum effects are obtained as follows maxeft p releff j gt maxeffsp As can be seen releff is a maximum effect that is re scaled to sum to 1 across attributes within a latent class The relative importances are depicted in a plot Attributes can be deleted from the Importance output using the plot control The relative effects are then rescaled to sum to one for the remaining attributes This feature can for example be useful if one is interested in
171. relative effects without considering the constants or the effect corresponding to a none option 53 4 4 Profile and ProbMeans The content of the Profile and ProbMeans output will be explained together because these two output sections are strongly related Both sections con tain 1 marginal latent probabilities 2 transformed Class specific attribute effects and 3 information on the relation between active and inactive co variates and class membership The first row of each output section contains the estimated marginal latent class probabilities P x see equation 16 In Profile these are called Class Size and in ProbMeans Overall Probability The Profile output contains transformed attributes effects 8 parameters including the constants As above let a denote a level of the attribute p A its total number of levels and Majw p the utility associated with level a for latent class x The reported choice probabilities for attribute p are obtained as follows exp Najzp y exp a ap P ale The P a x can be interpreted as the estimated choice probabilities in a set of A alternatives that differ only with respect to the attribute concerned For numeric attributes we also report the means 25 Zap Eplalz In ProbMeans the choice probabilities P a x are re scaled to sum to one over latent classes That is i DES P 2 P alz This number can be interpreted as the probability of being in latent clas
172. ribute This means that it is possible to refine the definition of any Class segment by enhancing or reducing the estimated part worth utility of any numeric attribute for that Class Recall that numeric attribute p enters as 9 2 in the linear part of the conditional logit model Suppose after estimating the model the es timate for 6 turned out to be 1 5 for Class 1 If zhp is specified to be an offset the importance of this attribute to be reduced 1 5 would be reduced to 1 for this Class But suppose that you wish to enhance the importance of this attribute for Class 1 say you wish to restrict oy to be equal to 2 The trick is to recode the attribute replacing each code by twice the value Thus the recoded attribute is defined as 2 zhp If we restrict the effect of this recoded attribute to 1 we obtain 1 2 27 which shows that the effect of Zitmp 18 equated to 2 Such recoding can be done easily within Latent GOLD Choice using the Replace option In addition to post hoc refinements to customize the definition of the resulting latent classes the offset restriction can also be used to make the Classes conform to various theoretical structures Probably the most impor tant a priori application of Offset is that of defining stayer or brand loyal classes A brand loyal class selects one of the brands with a probability equal 27 to 1 and is not affected by the other attributes An example of a restrictions table corre
173. roll down to the Prediction Statistics section 139 LatentGOLD ml x File Edit View Model Window Help Sal e sia gt bcRESP saw Prediction Statistics 1 class L Error Type Baseline 0 Baseline Param Squared Error 0 7500 0 7429 Profile Minus Log likelihood 1 3863 1 3723 ProbM Absolute Error 1 5000 1 4854 Set Pre Prediction Error 0 7500 0 6866 SIE ETE Set Pre 2 class L Prediction Table Estimated 3 class L Observed 1 4 class L 276 0 Model5 763 0 Figure 34 Prediction Statistics for 3 class Model By accounting for the heterogeneity among the 3 segments the prediction error has been reduced to 4575 by the 3 class model gt Click on the expand icon next to the 3 class model and select Parameters The part worth utility estimates appear for each class 140 LatentGOLD ml x File Edit View Model Window Help Sal e sia gt r bcRESP saw Model for Choices 1 class L Class1 Class Class3 Overall 2 class L R 0 1992 0 2966 0 0403 0 2157 3 class L R7 0 0 2363 0 3079 0 0480 0 2231 Param Profile ProbM Set Pre Set Pre 4 class L Model5 Attributes Class2 Class3 Wald FASHION Class1 p value 3 0138 0 1638 1 2205 488 1263 1 8e 105 QUALITY 0 0751 2
174. s a string variable Location which provides a unique name for the parameter such as r0001c01 The 3 right most columns in the output file are variables called se param and Label serve to define the parameters For example for this parameter row the string variable called Label might contain a label such as purpose 1 Il Class1 which means that this is the parameter estimate for with the 1st category of the attribute PURPOSE associated with class 1 The variables param and se correspond to the estimate and standard error for this parameter as reported in the Parameters Output The remaining variables on the file reproduce the parameter names provided in Location and contain the variance covariance matrix For example the entry in row 1 Location r0001c01 and column r0001c01 is the variance of this parameter estimate The entry in row 1 Location r0001c01 and column r0001c02 is the covariance associated with parameter estimates r0001c01 and r0001c02 111 Note Most users will not need to use this option These quantities are useful in computing the standard error of a particular function of the parameter estimates For further details see Sections 3 4 and 4 8 of the Technical Guide Default Options Click Default to restore the Output options to their original program default values Click Save as Default to save the current output settings as the new default
175. s based on an ordered categorical response Drug Information Journal 30 1 143 170 Magidson J Eagle T and Vermunt J K 2003 New developments in latent class choice modeling Proceedings Sawtooth Software Conference 2008 Magidson J and Vermunt J K 2001 Latent class factor and cluster models bi plots and related graphical displays Sociological Methodology 31 223 264 Magidson J and Vermunt J K 2004 Latent class analysis D Kaplan ed The Sage Handbook of Quantitative Methodology for the Social Sciences Chapter 10 175 198 Thousand Oakes Sage Publications McFadden 1974 Conditional logit analysis of qualitative choice behaviour I Zarembka ed Frontiers in econometrics 105 142 New York Academic Press McFadden D and Train D 2000 Mixed MNL models for discrete response Journal of Applied Econometrics 15 447 470 McLachlan G J and Krishnan T 1997 The EM algorithm and extensions New York John Wiley amp Sons Inc Natter M and Feurstein M 2002 Real world performance of choice based conjoint models European Journal of Operational Research 137 448 458 Patterson B H Dayton C M Graubard B I 2002 Latent class analysis of complex sample survey data application to dietary data Journal of the American Statistical Association 97 721 728 Rodriguez G and Goldman N 2001 Improved estimation procedures for multilevel models for binary response a case study
176. s computed with response probabilities equal to the average Bigs 4 Wi e Vit Pri Wi 2o Vit In models with an unrestricted set of constants Pa equals the observed distribution of y In that case Baseline can be interpreted as the constants only model The response probabilities under Baseline 0 are P 0 1 M which means that Baseline 0 is the equal probability model 51 4 2 Parameters The first part of the Parameters output contains Class specific and overall R and R 0 values based on Squared Error The overall measures are the same as the ones appearing in Prediction Statistics The logic behind the computation of the Class specific Ry measures is the same as for the overall measures see description of Prediction Statistics The Class specific errors are obtained by Dia wi Deka Vie Error ait Erroryja l T y Y j 1 Wi 21 Vit with Ori wi P c z yi as in equation 9 The definition of Error is based on the Class specific response probabilities P y m x 2 22 or shortly Pe For ratings the predicted value equals Jri gt Ura t mjerit and the corresponding error is Error Yit eit For choice and ranking variables Error equals lla lu Preil Similar to the overall R measures the Baseline error is based on the average Basn and Baseline 0 on 1 M In the second part of the Parameters output the program reports the estimates obtained for the P and y parameters appearing in the l
177. s is much quicker Prediction Type Latent GOLD Choice also reports Prediction Statistics Prediction statistics indicate how well the observed choices rankings or ratings are predicted by the specified model For rankings the prediction statistics are based on first choices only For choice and rating variables all replications are used for obtaining the prediction measures It is also possible to write predicted values to a file Predicted values can be computed in three different ways Posterior Mean Posterior Mean predicted values are defined as weighed averages of the class specific predicted values using an individual s posterior membership probabilities as weights HB like 110 As in Hierarchical Bayes the HB like predicted values are based on the Individual Coefficients which are weighted averages of the class specific regression coefficients with the posterior membership probabilities as weights Marginal Mean Marginal Mean uses the prior membership probabilities are weights which means that the observed values on the dependent variable are not used to generate the predictions Posterior Mean and HB like prediction yield similar results These methods give a good indication of within sample prediction performance Marginal Mean prediction yields much lower R sq values but gives a better indication of out of sample prediction performance For more information on Prediction Statistics see Section 4 1 5 of the Technic
178. s x given choice a on set p The third part of the Profile and ProbMeans output sections provides information for covariates This is information obtained by aggregating and re scaling posterior membership probabilities Magidson and Vermunt 2001 Let b denote a particular level of covariate r and B the number of categories of the covariate concerned and let the frequency count 7 a b be defined as follows lr b Y wi P alz yi UZip b where i Zi b denotes that the sum is over the cases with value b on the covariate concerned In Profile we report the probability of being in 54 covariate level b given that one belongs to latent class x Ay 1 b a le b and for numeric covariates also the means X ZP bl where Zp is the score of covariate category b ProbMeans contains the probability of being in latent class x given covariate level b P x D Np x b PED SK 0 0 For nominal attributes covariates the Profile plot depicts the choice probabilities P a x and covariate probabilities P b x For numeric at tributes and covariates the Profile plot contains 0 1 means which are means that are re scaled to be in the 0 1 interval In ProbMeans the quantities P a a and P b x are plotted in Uni and Tri plots Magidson and Ver munt 2001 Vermunt and Magidson 2000 Similar plots have been proposed by Van der Ark and Van der Heijden 1998 and Van der Heijden Gilula and Van der Ark 1999 for
179. sis is the occurrence of local maxima which also satisfy the likelihood equations given in 7 The best way to prevent ending up with a local solution is to use multiple sets of starting values which may yield solutions with different log posterior values In Latent GOLD Choice the use of such multiple sets of random starting values is automated The user can specify how many sets of starting values the program should use by changing the Random Sets option in the Technical Tab Another relevant parameter is Iterations specifying the num ber of iterations to be performed per start set More precisely within each of the random sets Latent GOLD Choice performs the specified number of EM iterations Subsequently within the best 10 percent in terms of log posterior the program performs an extra 2 times Iterations EM iterations Finally it continues with the best solution until convergence It should be noted that while such a procedure increases considerably the probability of finding the global PM or ML solution especially if both parameters are set large enough there is no guarantee that it will be found in a single run When a model contains two or more latent classes or one or more DFac tors the starting values procedure will generate the specified number of start ing sets and perform the specified number of iterations per set In one class models in which local maxima may occur for example in models with continuous factors see Advanced
180. specify different types of information matrices to be used in the computation of standard errors and Wald statistics The fourth option suppresses such computations Standard Hessian The Standard method makes use of the second order derivatives of the log likelihood function called the Hessian matrix This is the default option Robust Sandwich The Robust method sandwiches the inverse of the outer product matrix by the Hessian matrix Standard errors and Wald statistics obtained by the Robust method are less affected by distributional assumptions about the indicators and the dependent variable Fast Outer Product The Fast method approximates the information matrix using the outer product of the first order derivatives of the log likelihood function The Fast method may be used in models in which the other two methods are computationally intensive In such cases one can also suppress the computation of standard errors and Wald statistics None This option suppresses the computation of standard errors and Wald statistics option None This option may be useful when estimating models containing an extremely large number of parameters in which case computation of the second order derivatives used in Newton Raphson standard error computations and Wald statistics may take a lot of time By setting the Newton Raphson Iteration Limit to 0 and setting Standard Errors and Wald to None the estimation process for such large model
181. sponding to such a structure is Class 1 Class 2 Class 3 Class 4 Brand1 100 Brand2 100 Brand3 100 Constants Attributel Attribute2 fil AR A Ly Here means no effect and x means offset As can be seen Classes 1 2 and 3 are only affected by an offset and Class 4 the brand switching or mover class is affected by the constants and the two attributes The numeric attributes Brand1 100 Brand2 100 and Brand3 100 are brand dummies that take on the value 100 for the brand concerned and are 0 otherwise As a result of the fixed effect of 100 the probability of selecting the corresponding brand will be equal to one To illustrate this suppose that a choice set consists of three alternatives and that only alternative 1 is associated with brand 1 The probability that someone belonging to Class 1 selects alternative 1 equals exp 100 exp 100 exp 0 exp 0 1 0000 Although this model is similar to a zero inflated model the offset based specification is much more flexible in the sense that the number of brand loyal classes does not need to coincide with the number of alternatives per set In the above example the sets could consist of four instead of three alternatives say three different brands and a none alternative Now we will discusses several more advanced applications of the restric tion options The first is a model for ratings with
182. strap option can be used to estimate the p value for certain estimated models 122 Stop The Stop command may be used to pause the estimation prior to completion or to abandon the estimation completely Resume If a model is paused default names for paused models have the characters Paused appended to the original model name e g Model4Paused the Resume command may be used to continue the estimation process Delete This is used to delete the model name and any associated output files from the Outline pane Window Split Allows you to customize the window split between the Outline and Content Panes Help Contents Lists all the Help topics available Help Displays context sensitive help Item Help Creates a help cursor that you can point to get help on any particular item in the program Register Displays your registration code About Latent GOLD Provides general information about the program Many of the tasks you will want to perform with Latent GOLD Choice utilize menu selections Shortcuts for menu items are listed to the right of the item For example the shortcut for File Open is Ctrl O on your keyboard hold down the Ctrl key and then press the O key In addition a right click in the Contents Pane frequently causes a control panel or the appropriate menu options to appear For example a right click in a graphical display such as the tri plot causes the Tri plot Control Panel to ap
183. t a subset of cases for the analysis For example by specifying a variable with the value 1 for males and O for females as a case weight one will perform an analysis for males only This zero case weight option makes it straightforward to perform separate analyses for different subgroups that are in the same data file It should be noted that no output is provided for the cases with zero weights Similarly with a replication weight equal to zero one removes the corre sponding replication from the analysis This option can therefore be used to select choices to be used for parameter estimation for example one may wish to select the first and last choice from a full ranking for a maximum difference analysis 3 9 2 Replication weights equal to a very small number An important feature of the Latent GOLD Choice program is that it allows specifying hold out choices These are choices that are not used for param eter estimation but for which one obtains prediction information Hold out choices are defined by means of replications weights equal to a very small number i e 1 0e 100 These replications will be excluded when estimating the specified model Their predicted values however may be written to the output file This very small replication weight option can be used for validation purposes that is to determine the prediction performance of the estimated model for hold out choices 42 3 9 3 Case weights equal to a very s
184. t set 4 Click on the Choice Set symbol to highlight the appended choices on the plot VVVVVV Each vertex of the tri plot corresponds to a segment From the plot it can be seen that Class 2 corresponding to the lower right vertex is the segment most likely to prefer Higher Quality and Traditional style shoes This segment is also tends to be the oldest higher proportion of persons 40 of the segments 145 V Legend IV Point Labels 4 Vertex cast y Class3 1 0 0 0 B Vertex fashion Class2 baa quality r none Wariables sex Y age Set4 fashion quality price none d bOO Or sex age K K KIKI LIKI 0 8 Class2 1 0 Set Set 4 y Groups Update Close Figure 41 Tri Plot Set 4 consists of 1 TS2 2 TH4 3 MH5 4 None Itis clear from the plot that persons who choose alternative 2 TH4 from this set are highly likely to be in segment 2 Additional Output Options gt Latent GOLD Choice offers many additional output options Double click on 3 class final to re open the model setup screen gt Click on Output to Open the Output Tab To obtain output to see how to classify each person into the most likely segment gt Click on Standard Classification gt Click on Covariate Classification To append this information and predicted choices to the response file 146 Click on ClassPred to Open the ClassPred Tab Click on Standard Classification Click on Covariate Class
185. te the model gt Re name this model 3 Class Final Notice that the BIC for this final model is now the best LatentGOLD File Edit View Model Window Help sal ejej sj gt lel e O x Figure 38 Updated Model Summary Output 143 LL BIC LL Npar Class Err Ria a 4 class 1 Class Choice 4145 1973 8320 3519 5 0 0000 0 0454 2 class 2 Class Choice 3704 6949 7493 2703 14 0 0303 0 1988 3 class 3 Class Choice 3648 6719 7435 1474 23 0 0707 0 2157 4 class L 4 class 4 Class Choice 3640 7484 7473 2238 32 0 0810 0 2207 3 dassBot 3 classBoot 3 Class Choice 3648 6719 7435 1474 23 0 0707 0 2157 4 classBoc 4 classBoot 4 Class Choice 3640 7484 7473 2238 32 0 0810 0 2207 Model5 L Model5 3 Class Choice 3649 1386 7418 1065 20 0 0720 0 2156 uN y 3 class final 3 Class Choice 3651 0166 MB 18 0 0698 0 2146 gt Zi To remove the Bootstrap models gt Select 3 classBoot gt Select Delete from the Model menu gt Repeat this for 4 classBoot Profile Output viewing Re scaled Parameters gt Click on the expand icon next to 3 class final model to display the output files gt Click on Profile to display the Profile Output in the Contents Pane File Edit View Model Window Help Gal ele SQ e CbcRESP sav 4 Class1 Class2 Class3 1 class L Class Size 0 5033 0 2646 0 2321 2 class L
186. ted average of the case specific errors Error z z y D w Error x z y T Error x z y The three R measures differ in the definition of Error x z y In R it equals L max P 2 z yi in Ro ion Ta a Plalz yi log Plalz yi and in R 1 gt P alz yi In the computation of the total error Error w the P 2 z y are replaced by the estimated marginal latent A probabilities P x which are defined as o a wPlalz Y ii e Pa N 16 The Average Weight of Evidence AW E criterion adds a third dimen sion to the information criteria described above It weights fit parsimony and the performance of the classification Banfield and Raftery 1993 This measure uses the so called classification log likelihood which is equivalent to the complete data log likelihood log i e I K log L 5 Y Da log P xz Py je 20 gered i 1 x 1 AWE can now be defined as 3 AWE 2log L 2 5 log N npar The lower AW E the better a model The Classification Table cross tabulates modal and probabilistic class as signments More precisely the entry x x contains the sum of the class x 48 posterior membership probabilities for the cases allocated to modal class 2 Hence the diagonal elements x 2 are the numbers of correct classifi cations per class and the off diagonal elements x 4 x the corresponding numbers of misclassifications From the classification table
187. ted by K Two other variables that may be used in the model specification are a replication specific scale factor s and a replication specific weight vi Their default values are one in which case they do not affect the model structure 2 1 First Choices We start with the description of the regression model for the simplest and most general response format first choice For simplicity of exposition we assume that each replication or choice set has the same number of alterna tives Later on it will be shown how to generalize the model to other formats including choice sets with unequal numbers of alternatives A conditional logit model is a regression model for the probability that case i selects alternative m at replication t given attribute values z and pre att pre predictor values z This probability is denoted by P y mlz Zig Attributes are characteristics of the alternatives that is alternative m will 11 have different attribute values than alternative m Predictors on the other hand are characteristics of the replication or the person and take on the same value across alternatives For the moment we assume that attributes and predictors are numeric variables In the subsection Coding of Nominal Variables we explain in detail how nominal explanatory variables are dealt with by the program The conditional logit model for the response probabilities has the form re exP Nmjz Pl Yu mz ze
188. ted in Latent GOLD Choice Now we will combine all these elements and provide the structure of the general LC choice model The general probability density function associated with case 1 is P yilZ i Ti Tia P 2 z P yi z Zi Zo Ty Sa Tie Pela J Puelo 2 2 6 t 1 where P x z2 and P ya mx za zh are parameterized by logit mod els that is cov exp Maja Plala ia exp Nx z exp Sit Nnlwzin X m cAa exp Sit Nmr zi P ya mle 2 Za Su The linear model for Najz is R Mala Vox le 5 fradi r 1 For first choices and rankings mje z equals per att gott pre ypre Minja zi mack z Bo Zitmp T gt xmq Zitq if m Ay and oo otherwise For ratings tt tt Nm z zit Bim F Ym gt Bop 2 Zitp T gt B a 3 Estimation and Other Technical Issues 3 1 Log likelihood and Log posterior Function The parameters of the LC choice model are estimated by means of Maximum Likelihood ML or Posterior Mode PM methods The likelihood function 30 is derived from the probability density function defined in equation 6 Let V denote the vector containing the y and 8 parameters As before y and z denote the vectors of dependent and explanatory variables for case i and I denotes the total number of cases ML estimation involves finding the estimates for Y that maximize the log likelihood function I log Sw log P y Z 0 i 1 Here P y z 0 is the probab
189. ted is that in multilevel models the strata PSUs and sampling weights concern groups rather than cases that is one has strata and PSUs formed by groups and sampling weights for groups For parameter estimation only the sampling weights need to be taken into account When sampling weights are specified Latent GOLD Choice will estimate the model parameters by means of pseudo ML PM estima tion Skinner Holt and Smith 1989 Recall that ML estimation involves maximizing I log 5 w log P yi Zi v i 1 22Tn Latent GOLD Choice one can either specify the fraction e or the population size No If the specified number in Population Size is smaller than 1 it is interpreted as a fraction otherwise as a population size 74 where w is a case weight In pseudo ML estimation one maximizes O Ce loc log Ensen 5 5 5 SWoci log P Y vic Zoic V o 1 e 13 1 which is equivalent to maximizing log using the sampling weights as if they were case weights In Latent GOLD Choice one may also have both case and sampling weights in which case we get O Ce loc log Lpze do yo 5 5 Woci SWoci log PCY ete Doi 0 y o c 1i 1 which is equivalent to performing ML estimation using the sw Woci AS case weights Each of the four complex sampling characteristics is taken into account by the so called linearization estimator of variance covariance matrix of the parameter estimates Skinner Holt and Smith 198
190. the effect of a selected numeric attribute predictor is assumed to have the same sign or the effects corresponding to a selected nominal attribute predictor are as sumed to be ordered either ascending or descending That is for numeric attributes predictors the ascending restriction implies that the Class specific coefficients should be at least zero 8 gt 0 and the descending restriction that they are at most zero 8 lt 0 For nominal attributes predictors ascending implies that the coefficient of category 2The term offset stems from the generalized linear modeling framework It refers to a regression coefficient that is fixed to 1 or equivalently to a component that offsets the linear part of the regression model by a fixed amount An offset provides the same role as a cell weight in log linear analysis An offset is in fact the log of a cell weight 25 p 1 is larger than or equal to the one of category p Bp lt Bp 1 for each p and descending that the coefficient of category p 1 is smaller than or equal to the one of category p Bp gt p41 for each p The Class Independent option can be used to specify models in which some attribute and predictor effects differ across Classes while others do not This can either be on a priori grounds or can be based on the test statistics from previously estimated models More specifically if the Wald test is not significant it makes sense to check whether an effect can be
191. the final solution the program switches to Newton Raphson This is a way to exploit the advantages of both algorithms that is the stability of EM when it is far away from the optimum and the speed of Newton Raphson when it is close to the optimum The task to be performed for obtaining PM estimates for Y is finding the parameter values for which dlogP _ 0log Alogp v aa 7 Here Olog L 7 Ye log P y z 0 Oo 09 E Ln Dog EL Plajz2 0 Plyile 26 2 8 NE ao I K fou att gared Olog P x z 0 P yi 0 27 z 0 ae 09 AS where cov v0 i att gered 0 We wi P a z yi 0 wi PUE Plyilz 2 2s 9 P yilz 0 The EM algorithm is a general method for dealing with ML estimation with missing data Dempster Laird and Rubin 1977 McLachlan and Kr ishnan 1997 This method exploits the fact that the first derivatives of the 35 incomplete data log likelihood log equal the first derivatives of the com plete data log likelihood log 2 The complete data is the log likelihood that we would have if we knew to which latent class each case belongs K log NY ws log Plaza 8 Pyle z 2 0 i 1 r 1 I K Y Y wologPlrlzj 0 10 i 1 x 1 I K Ti F 5 y Wai Yu Vit log P yit x Zin a 0 i l r 1 t 1 Each vth cycle of the EM algorithm consist of two steps In the Expec tation E step estimates are obtained for wy via equation 9 filling in 0 as parameter values
192. the summations are over all t instead of t Z Set Profile also contains information on the observed choice probabilities pe m as well as residuals per alternative and per set that compare ob served with overall estimated choice probabilities The standardized residual StdResid for alternative m of set is obtained as follows pelm P m Ny P m oe where Np Wi Stee Vit The univariate residual UniResid for set is defined as ee M pe m Pe m Amt Po m Ne Mo 1 Note that this is just a Pearson chi squared divided by the number of degrees of freedom or the number of possible alternatives in set minus 1 The Set Probmeans is obtained by re scaling the P m x that is Puelm P x P m z DME Ple Pu mix These quantities which can be plotted with the Probmeans in the Uni plot and Tri plot indicate the probability of being in latent class x given that alternative m was selected in set The file in which the choice sets are defined may contain choice sets that are not presented to respondents For such simulation sets the Set Pro file output reports P y mlx 23 z e the estimated Class specific choice probabilities given their attribute values and the mean of the predictor val ues The overall choice probabilities for simulation sets are weighted aver ages of the Class specific choice probabilities where P x serves as weight 6Note that the predictor val
193. ties 120 Main Menu Options The Menu Bar in Latent GOLD has 6 general menu options File Edit View Model Window and Help File This file menu can be used to perform the following functions Open Opens a data file Latent GOLD accepts as input data an SPSS system file an ASCII text rectangular file a special array file format for multi way tables In addition you can Open a previously saved Latent GOLD definition lgf file Upon opening a data file the data file name is listed in the Outline Pane outermost level and the default file name Modell appears beneath the data file second level and may be used to specify and estimate a New Model Close Closes the data file highlighted in the Outline Pane Save Results Allows you to save your output to either a html or an ASCII text file Save Definition This saves the analysis settings that have been specified for one or more models on a particular data file Print Prints output obtained after any model estimation Print Preview Preview printed output on screen Print Setup Sets various printing options At the bottom of the File menu recently opened data files are listed for easier access Exit Exit the program Prior to exiting Latent GOLD will prompt you to save your Model definitions and results if you have not already done so Edit Copy Allows you to copy to the clipboard any output highlighted in the Contents Pane Select All Selects all output
194. ues are missing for simulation sets 56 4 6 Frequencies Residuals Latent GOLD Choice reports estimated and observed cell frequencies m and n as well as standardized residuals 7 The computation of esti mated cell entries was described in equation 13 The standardized residuals are defined as Mir Nyx A Mir Note that Fi is cell i s contribution to the X statistic This output section also contains a column Cook s D Cook s Distance This measure can be used to detect influential cases or more precisely cases having a larger influence on the parameter estimates than others The exact formula that is used in Latent GOLD Choice 4 0 is given in equation 21 A typical cut point for Cook s D is four times the number of parameters divided by the number of cases Skrondal and Rabe Hesketh 2004 Note that the reported value in a particular row corresponds to the Cook s D for each of the cases with that data pattern a Pix 4 7 Classification Information The Classification output section contains the classification information for each data pattern 1 We report the posterior class membership probabilities Plalz yi as well as the modal Class the latent class with largest proba bility This method of class assignment is sometimes referred to as posterior mode empirical Bayes modal EBM or modal a posteriori MAP estima tion Skrondal and Rabe Hesketh 2004 Classification can also
195. up level random effects in the model for the latent classes which is a way to take into account that groups differ with respect to the distribution of their mem bers across latent classes Vermunt 2003 2005 Vermunt and Magidson 2005 Not only the intercept but also the covariate effects may have a ran dom part Another variant involves including GCFactors and GClasses in the model for the choices By combining group level with case level latent classes one obtains a three level conditional logit model with nonparametric random effects and by combining group level continuous factors with case level con tinuous factors one obtains a standard three level random coefficients condi tional logit model The latter is a special case of the three level generalized linear model Vermunt 2002c 2004 The Survey option makes it possible to get correct statistical tests with stratified and clustered samples as well as with sampling weights and samples from finite populations The design corrected variance covariance matrix of the model parameters is obtained by the well known linearization estimator Sampling weights can also be dealt with using a two step procedure that involves estimating the model without sampling weights and subsequently correcting the latent class distribution and covariate effects using the sam pling weights The next three sections describe the three Advanced options in more detail Attention is paid to model components est
196. utes to move them to the Attributes list box or double click each variable that you want to move to the Attributes box and then click Attributes The designated variables now appear in the Attributes list box Choice Model CbcRESP sav Modeli x Variables Attributes Advanced Model ClassPred Output Technical Constants _ lt A tributes i Nominal Nominal Numeric one Numeric 2 T Lexical Order Altematives CAcbcALT11 sav Total Altematives fi 1 Altemative ID prodcode Choice Sets Total Choice Sets E Altematives B Set ID setid CicbcSET sav Figure 22 Selecting the Attributes 132 By default character variables and variables containing consecutive integers such as FASHION QUALITY and PRICE are treated as Nominal To view the coding for any variable double click on that variable gt Double click on PRICE to view the consecutive integer scores assigned to the price levels gt Click Cancel to close this Score window Figure 23 PRICE Score window PRICE Fixed x Score OK Cancel Cat Uniform i User Group k Groupsf0 To change the scale type of PRICE to numeric gt Right click on PRICE and select Numeric To change from effects coding the default for Nominal variables to 0 1 dummy coding for FASHION and QUALITY we can change the scale type to Numeric and change the scores to 0 1 Select FASHION and QUALITY Right cl
197. values Click Restore to Defaults to restore the Output options to their last saved default settings Click Cancel Changes to cancel any changes that have been made to the Output options and not saved ClassPred Tab The options on the ClassPred tab can be used to request that certain items should be written to an output files Choice Model BrandsAB sav Modell Figure 12 ClassPred Tab Standard Classification optional 112 When the input data file is either an ASCII text file or an SPSS sav file this option produces a new data file containing the standard classification information such as the probability of being in each class together with any covariates and other variables specified in the Variables Tab and the variable included in the ID box of the ClassPred Tab if any The format of the output file will be the same as that of the input file ASCII or sav This option is not available when using the range option in the Variables Tab to specify a range of models In addition to these probabilities Latent GOLD Choice also appends classification variables containing that class into which the respondent should be classified the one being the highest membership probability For each case in the analysis file the variables on the new data file consist of the model variables the posterior class membership probabilities and the modal class classification i e the number of the class having the highest posterior probability
198. values Click Restore to Defaults to restore the technical options to their last default values Click Cancel Changes to cancel any changes that have been made to the Technical options and not saved 107 Step 2 Specify Output Options Once you have completed your model setup prior to estimating your model you may wish to specify the output options Output Tab Choice Model BrandsAB sav Modell a Werle Resia Wels a hhl Since WAWES M M M M a a a El M M M r Figure 11 Output Tab A checkmark indicates that the associated Output listings are produced For details of these output files see Viewing Output By default the following are produced checkmark equals on Parameters Shows hides Parameters in Output Window Profile Shows hides Profile in Output Window ProbMeans Shows hides ProbMeans in Output Window They may be de selected by clicking the check box Output check equals off in which case this type of output not appear will 108 The remaining output listings can be obtained by clicking the check box Output check equals on Bivariate Residuals Produces a Table containing bivariate residuals The output file will be listed as Bivariate Residuals in the Outline pane Frequencies Residuals Produces an output file containing observed and estimated frequencies and standardized residuals for each combination of variables This output is not available if any variables in an analysis have
199. variable to be used as the dependent variable Dependent Variable Types The Dependent Variable may specify a single choice within each set 2 or more choices ranking and partial ranking models or a rating By default the scale type is set to Choice To set the scale type for the dependent variable right click on the variable in the Dependent Variable box and the following menu pops up listing the different Dependent Variable Types Choice Ranking Rating Zero Inflated Standard Figure 4 Dependent Variable Types Menu In the 3 file format the value of the dependent variable equals between 1 and the number of alternatives in the choice set concerned In ranking models the order of the records with the same choice set ID in the response file indicates the rank of the alternatives In the 1 file format for choice models the dependent variable equals 1 for the selected alternative and O otherwise for ranking models it equals 1 2 3 etc for the selected alternatives where the number indicates the rank and 0 for the non selected alternatives in the case of a partial ranking for rating models it equals the rating as in the 3 file format Case ID assign one variable to be used as Case ID If no Case ID variable is specified the program assumes that the data file contains only a single choice set that is only 1 record in the 3 file format or 1set of alternatives records in the 1 file format per case For f
200. ve to be performed with the original cases The first is the grouping of identical cases that is cases that have the same covariate known class predictor and attribute values and give the same responses This yields unique data patterns with observed frequency counts denoted by n where i denotes a particular data pattern These fre quency counts are obtained by summing the case weights w of the cases with data pattern i that is nj ij wi In order to obtain the chi squared statistics we also need to group cases with identical covariate known class predictor and attribute values which amounts to grouping cases without taking into account their responses This yields the sample sizes N for the U relevant multinomials where u denotes a particular multinomial or covariate pattern These sample sizes are obtained by N icu Wi Or Na Yee eee Note that N Y Na Let Mm denote the estimated cell count for data pattern 2 which is obtained by a Mix Nu Plyo Zi 13 i e by the product of the total number of cases with the same covariate pattern as data pattern i N and the estimated multinomial probability 9 With the somewhat loose but rather simple notation 7 i we mean all the cases with data pattern 1 10With missing values on some replications also the missing data pattern is used as a grouping criterion That is cases belonging to the same covariate p
201. ys Some of the advanced applications include e Allocation models Replication weights may be used to handle designs where respondents allocate a number of votes purchases points among the various choice alternatives e Best worst and related models A scale factor of 1 can be used to specify the alternative s judged to be worst or least preferred as opposed to best or most preferred Latent GOLD Choice 4 0 Advanced The following new features are included in the optional Advanced Module requires the Advanced version of Latent GOLD Choice 4 0 e Continuous latent variables CFactors an option for specifying models containing continuous latent variables called CFactors in a choice ranking or rating model CFactors can be used to specify random coefficients conditional logit models in which the random coefficients covariance matrix is restricted using a factor analytic structure It is also possible to use random effects in conjunction with latent classes yielding hybrid choice models combining discrete and continuous unobserved heterogeneity If included additional information pertaining to the CFactor effects appear in the Parameters ProbMeans and the Classification Statistics output and CFactor scores appear in the Standard Classification output e Multilevel modeling an option for specifying LC choice models for nested data structures such as shoppers individuals within stores groups Group level vari

Latent GOLD Choice 4.0 User`s Manual

Contents

Download Pdf Manuals

Related Search

Related Contents