Home

User's Guide to lqa

1. stop beta must be the current coefficient vector n p lt length beta lambda check lambda if length lambda 2 stop The fused lasso penalty must consist on two parameters n names lambda lt c lambdai lambda2 first derivative lt function beta NULL if is null beta stop beta must be the current coefficient vector n p lt length beta if p lt 2 stop There must be at least two regressors n vec1 lt c rep lambda i p rep lambda 2 p 1 return vecl as integer drop t a coefs beta beta 0 a coefs lt function beta NULL if p lt 2 stop There must be at least two regressors n if p gt 2 hi lt cbind diag p 1 0 h2 lt cbind 0 diag p 1 mati lt hi h2 mat2 lt diag p a coefs mat lt cbind mat2 t mat1 else a coefs mat lt cbind diag 2 c 1 1 return a coefs mat structure list penalty fused lasso lambda lambda first derivative first derivative a coefs a coefs class penalty tt t t tt Note that the fused lasso penalty only works by definition for at least two regressors Actually we do not need the beta argument in the first derivative and a coefs functions directly We only use it to determine the dimension parameter p We conclude this section with two remarks on p
2. AY squared loss deviance loss DL A Dev dev loss Table 4 Implemented loss functions Note if loss func gcv loss is chosen then this does not match with given validation data This is due to the construction of the GCV as an approximation to leave one out cross validation If there are nonetheless given some validation data then cv lqa will replace them by the training data arguments y train and x train However the estimate by and the hat matrix Hy are part of the output of 1qa But to compute terms like Dev and ji the lqa package provides the function predict 1qa This function is primarily used internally but you can use it directly as well See the lqa manual Ulbricht 2010a for documentation of it The function predict lgaQ returns an object pred obj of the class pred lga This object is the input argument for all loss functions in lqa Therefore it is worth to look at it a little bit more closer The object pred obj contains the following elements deviance the deviance based on the new observations This element provides Dev tr H the trace of the hat matrix of the design matrix used to fit the model e g H This is just an extraction from the lqa obj object that is used as input argument in predict 1lqa n newobs the number of new observations eta new the estimated new predictors mu new the estimated new responses e g a vector containing Tan lqa obj the 1qa obj argument of the predict
3. obj that is the GLM with the chosen penalty family and the optimal tuning parameter is printed This includes its unstandardized estimated coefficients and some model summary statistics The cv 1qa function can deal with cross validation for up to three dimensional tuning parameters In this case printing of the loss matrix which then in fact is a loss array becomes quite striking for the user Therefore a mean array is provided in those cases which just gives the CV scores and hence lowers the dimension of the loss array about one To illustrate this we want to do cross validation for our simulated data and now apply the fused lasso penalty For Ay we consider just three candidates The function call is R gt cv obj2 lt cv lqa y x family binomial penalty family fused lasso lambda candidates list c 0 001 0 05 0 5 1 5 c 0 001 0 01 0 5 n fold 5 loss func dev loss R gt cv obj2 We just extract the mean array and the optimal tuning parameter in the following mean array lambda2 0 001 lambda2 0 01 lambda2 0 5 lambda1 0 001 14 01300 13 55434 13 21141 lambda1 0 05 13 14641 12 93533 13 24855 lambda1 0 5 13 58486 13 53301 14 41157 lambdai 1 15 48474 15 47385 16 22203 lambdai 5 24 94260 24 94256 24 94281 lambda opt 0 05 0 01 4 4 Examples for using the plot 1lga function 21 The mean array displays the values of CV A for all tuning parameter candidates combi nations
4. penalreg correlation based penalty lasso lasso penalty adaptive lasso adaptive lasso x fused lasso fused lasso oscar oscar scad scad xX weighted fusion weighted fusion X bridge bridge x enet Elastic net x genet Generalized x elastic net icb Improved x correlation based penalty licb L norm x improved correlation based penalty ao Approximated x octagon penalty Table 2 Implemented penalties of class penalty Whether they consist of getpenmat or of first derivative and a coefs is tagged with an x for some small c gt 0 where 1 g 49 1 if 8 0 and 14 40 0 otherwise Thus we might use the getpenmat function The source code of the complete implementation of the lasso penalty is given in the following R gt lasso lt function lambda NULL lambda check lambda Check on dimensionality of lambda if length lambda 1 stop lambda must be a scalar n 4 4 4 names lambda lt lambda getpenmat lt function beta NULL c1 lga control c1 if is null beta stop beta must be the current coefficient vector n if cl lt 0 stop c1 must be non negative n penmat lt lambda diag 1 sqrt beta 2 c1 as integer beta 0 penmat structure list penalty lasso lambda lambda getpenmat getpenmat class penalty t tt tee t ttt Here you see the basic structure of a penalty ob
5. b denote score vector and Fisher information matrix of the underlying log likelihood respectively and 7 ae eae A 0 Ay where 0 is the p dimensional null vector The approximated penalty matrix A is defined as J 1 T a Y Pj j B l ajaj 5 y aJ bw where Ph lal Bwl dpy a 6 d a 6 denotes the first derivative with p 0 0 and c gt 0 is a small positive real number In the lqa package we will use c le as default value To our experience this value works quite well The Newton update 4 is repeated until convergence We apply the rule bue 1 bll lt e b l c gt 0 to terminate the algorithm Under this criterion the algorithm is stopped if the relative distance moved during the k th iteration is less or equal to e We use an additional step length parameter 0 lt y lt 1 to enhance convergence of the algorithm This parameter is usually treated as fixed constant In our experience a value y lt 1 becomes especially appropriate if the penalty term includes an L norm with large e g 1 gt 10 Examples where this can happen are the bridge Frank and Friedman 1993 or the OSCAR Bondell and Reich 2008 penalty In this article we will describe the R R Development Core Team 2009 add on package Iqa The lqa package has been originally designed to apply the LQA algorithm see Ulbricht 2010b for details to compute the constrained MLEs of GLMs with specifi
6. lqa function This is the return object of the lqa function so that you can in principle access all of its elements new y the vector of new observations 4 3 Examples for using the cv lqa function 19 Consequently you can use all of those arguments in your loss function Exemplarily we show the source code of the gcv loss function R gt gcv loss lt function pred obj dev lt pred obj deviance tr H lt pred obj tr H nobs lt pred obj n newobs nobs dev nobs tr H 2 The output of a loss function must be a non negative scalar 4 3 Examples for using the cv lga function We want to continue the example from Section 4 1 We still want to apply the lasso penalty but are now looking for an optimal tuning parameter Since we did not have simulated a validation data set we use a five fold cross validation based on the deviance Consequently optimality of A means minimizing the cross validation score To keep the output small we restrict to only five tuning parameter candidates Then the function call and its result are R gt cv obj lt cv lga y x family binomial penalty family lasso lambda candidates list c 0 001 0 05 1 5 10 n fold 5 loss func dev loss R gt cv obj cv lqa y train y x train x lambda candidates list c 0 001 0 05 0 5 1 5 family binomial penalty family lasso n fold 5 loss func dev loss loss function dev loss v
7. the tuning parameter vector in the R functions of lqa But as we will see later on they just appear as list elements in the structure environment The function getpenmat and the functions first derivative and a coefs are mutually exclusive Whether we need the first one or the last two depends on the nature of the penalty Hence we have to distinguish two cases see also Table 2 i The use of a function getpenmat is more efficient in a numerical sense if e the penalty matrix A as given in 5 is a diagonal matrix e g if J p and aj j 1 J just contains one non zero element or e the penalty is quadratic Then the approximate penalty matrix A can be computed directly Most imple mented penalties are of those types e g ridge lasso SCAD and correlation based penalty ii The combination of the functions first derivative and a coefs is necessary in all other cases The fused lasso penalty is an example for it We illustrate the basic structure of penalty objects with the lasso and the fused lasso object For the lasso penalty Tibshirani 1996 Pp P 6 X px la Bl j 1 with a 0 0 1 0 0 where the one is at the j th position and p j amp j Al amp l amp a B it is straightforward to show that the approximate penalty matrix is A Adiag 14 40 8 0 7 Lpo b o 7 Name Description getpenmat first derivative a coefs ridge ridge penalty x
8. with multicollinearity where the application of shrinkage methods indeed makes sense Now we specify the response The GLM will be a logit model that is y B 1 p and E yi xi pi h n with h n 1 1 exp 7 We will denote the corresponding simulated response vector as y The R code for this data generation is given below where we fixed the seed of the random number generator for illustration purpose R gt n lt 100 R gt p lt 5 R gt R gt set seed 1234 R gt x lt matrix rnorm n p ncol p R gt x 2 lt x 1 rnorm n sd 0 01 R gt x 3 lt x 1 rnorm n sd 0 1 R gt beta lt c 1 2 0 0 1 R gt probi lt 1 1 exp drop x beta R gt y lt sapply prob1 function prob1 rbinom 1 1 prob1 We want to fit a logistic regression model with 1qa using the lasso penalty with tuning parameter A 0 9 and fitting method 1qa update2 Since the latter is the default value of method we do not need to state it explicitly This results in R gt lqa obj lt lqa y x family binomial penalty lasso 0 9 R gt lqa obj Call lqa formula formula y x family binomial penalty lasso 0 9 Coefficients Intercept x1 x2 x3 x4 x5 7 940e 01 9 202e 02 1 208e 00 7 612e 03 8 299e 07 5 213e 03 Degrees of Freedom 99 Total i e Null 98 3658 Residual Null Deviance 123 8 Residual Deviance 71 8 AIC 75 07 Some additional information can be
9. As you can see CV A 0 05 Ag 0 01 delivers the minimum By the way at this position it should become clear why it is necessary to name your tuning parameters in objects of the penalty class Otherwise the labeling of rows and columns in the mean array could be misleading 4 4 Examples for using the plot lqa function In many situations it is preferable to look at the coefficient build ups of a penalization method For those cases you can use the plot 1qa function Its usage is plot lqa y x family penalty family intercept TRUE standardize TRUE lambdaseq NULL offset values NULL show standardized FALSE add MLE TRUE control lga control where the input parameters are y X family penalty family intercept standardize lambdaseq offset values show standardized add MLE the vector of observed responses the design matrix If intercept TRUE then it does not matter whether a column of ones is already included in x train or not The function adjusts it if necessary identifies the exponential family of the response and the link function of the model See the description of the R function family for further details a function or character argument identifying the penalty family logical If intercept TRUE then an intercept is included in the model this is recommended logical If standardize TRUE the data are standardized this is recommended a sequence of tuning param
10. User s Guide to Iqa Jan Ulbricht April 19 2010 1 Introduction Ulbricht 2010b has introduced the quite generic augmented LQA algorithm to estimate Generalized Linear Models GLMs based on penalized likelihood inference Here we do not want to repeat details on how to specify or estimate GLMs For a nice overview on these topics see for example McCullagh and Nelder 1989 Fahrmeir and Tutz 2001 or Fahrmeir et al 2009 Instead we provide you in a nutshell with the basic aspects of the LQA algorithm Consider the vector b 9 8 of unknown parameters in the predictor of a GLM The term bo denotes the intercept and contains the coefficients of the p regressors We want to solve the penalized regression problem min E b P B 1 where b is the log likelihood of the underlying GLM and the penalty term has structure J P B gt Palla Bl 2 with known vectors of constants aj The subscript A illustrates the dependency on a vector of tuning parameters We assume the following properties to hold for all J penalty terms i PA j R gt o Rso with py j 0 0 ii py is continuous and monotone in a J iii paz is continuously differentiable for all aj 4 0 so that dp aj G djaj 6 gt 0 for all a 3 gt 0 The first two assumptions are necessary to make sure that P is indeed a penalty By construction the penalty is symmetric in al B around the origin The algebraic form of 2 covers a
11. ackage version 1 0 2 Ulbricht J 2010b Variable Selection in Generalized Linear Models Ph D thesis LMU Munich Venables W and B Ripley 2000 S Programming New York Springer Wood S N 2006 Generalized Additive Models An Introduction with R Boca Raton Chapman amp Hall CRC
12. alidation data set used FALSE number of folds 5 loss matrix fold 1 fold 2 fold 3 fold 4 fold 5 mean lambdal 0 001 20 37285 29 55333 9 305895 8 726739 18 32540 17 25684 lambdal 0 05 18 41687 27 10695 7 914941 9 013015 16 03634 15 69762 lambdal 0 5 17 75243 19 67935 8 849785 13 260267 15 62594 15 03356 lambdal i 19 31417 17 05229 10 995958 17 149802 17 19262 16 34097 lambdal 5 26 12830 20 94446 23 028678 30 310342 26 12799 25 30795 lambda opt 0 5 Call lqa default x x train y y train family family penalty 4 3 Examples for using the cv lqa function 20 penalty control control intercept intercept standardize standardize Coefficients 1 8 560e 01 2 812e 01 1 404e 00 5 820e 02 1 355e 07 2 474e 01 Degrees of Freedom 99 Total i e Null 97 8255 Residual Null Deviance 123 8 Residual Deviance 62 05 AIC 66 4 The print function for cv obj shows the function call first Afterwards information on the applied loss function the existence of a validation data set and the number of cross validation folds are given Thereafter the loss matrix is printed where each row corresponds to one tuning parameter candidate and the first columns correspond to the n fold folds The last column shows the values of CV A As you can see in our example A 0 5 delivers the smallest value and hence is regarded as optimal tuning parameter Note that the difference to CV 0 05 is only small Afterwards the so called best
13. ao 2 o 2 O o s 5 D N 5 2 Ss Q g So a 1O F 0 0 0 2 0 4 0 6 0 8 1 0 beta max beta Figure 1 The resulting plot as returned from the function plot 1qa when it is applied to the simulated example and Az 0 01 is held fixed Hothorn T P Biihlmann T Kneib M Schmid and B Hofner 2009 mboost Model Based Boosting R package version 1 0 7 Lawson C and R Hanson 1974 Solving Least Squares Problems Englewood Cliffs NJ Prentice Hall Leisch F 2008 Creating R packages A tutorial In P Brito Ed Compstat 2008 Proceedings in Computational Statistics Physica Verlag Heidelberg Germany McCullagh P and J A Nelder 1989 Generalized Linear Models 2nd ed New York Chapman amp Hall R Development Core Team 2009 R A Language and Environment for Statistical Computing Vienna Austria R Foundation for Statistical Computing ISBN 3 900051 07 0 REFERENCES 24 Tibshirani R 1996 Regression shrinkage and selection via the lasso Journal of the Royal Statistical Society B 58 267 288 Tibshirani R M Saunders S Rosset J Zhu and K Knight 2005 Sparsity and smoothness via the fused lasso Journal of the Royal Statistical Society B 67 91 108 Tutz G and J Ulbricht 2009 Penalized regression with correlation based penalty Statistics and Computing 19 239 253 Ulbricht J 2010a Iqa Local Quadratic Approximation R p
14. c penalties In the further development the package has been extended to deal also with boosting algorithms such as componentwise boosting GBlockBoost and ForwardBoost Ulbricht 2010b which in turn are primarily intended to be used with quadratic penal ties The LQA algorithm can be viewed as an extension of the P IRLS algorithm see e g Wood 2006 so that the latter is also indirectly included As we will see later on we use a hierarchical structure of R functions for the package that provides preparation methods standardization extraction of formula components etc the computation of important statistics degrees of freedom AIC penalized Fisher information matrix etc and generic functions summary plot etc A convenient way to specify the details of the penalty terms is crucial for the computational application of the LQA algorithm and the boosting methods mentioned above Therefore we start with the introduction of the penalty class in the next section The basic structure of the lqa package will be explained in Section 3 some examples for applications are illustrated in Section 4 2 The penalty class Since the lqa package focuses on shrinkage and boosting methods there will always be a penalty included in the objective function or plays a role during the inference procedure Furthermore we need to compute different terms corresponding to a specified penalty such as the penalty level or the coefficients or its gradients fo
15. da argument Afterwards you might name the elements of lambda This helps to identify the tuning parameters of different methods when you want to compare their performances In a next step the getpenmat method is implemented To compute 7 we need the arguments beta that indicates the current vector and c1 that corresponds to c According to the approximation 5 and the arguments made about this we must make sure that ph 0 0 Therefore we use as integer beta 0 as a multiplier in the computation of penmat which is in fact the R representation of the indicator function Finally the object is initialized using the function structure where all its elements are listed Note that the membership of the class penalty must also be given here For the fused lasso penalty things are a little bit more complicated Since aj j p 1 2p 1 consists of two non zero elements we cannot apply getpenmat The first derivatives of the penalty terms are Mites GS Takao a e algo J P 1 2p 1 This will be returned by the first derivative function see below A summarized version of the coefficients is a ag ap Ap 1 ap 2 A 1 1 0 0 eel 0o 0 0O 1 0 ns Re 0 0 0 0 Ti ia 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 This p x 2p 1 dimensional matrix will be returned by a coefs So the complete source code of the fused lasso object is R gt fused lasso lt function lambda NULL if is null beta
16. e another function in this case that e g computes the trace of the hat matrix Since we could also use some source code from the P IRLS algorithm for the implementation of the boosting methods it was obvious not to take mboost into consideration here In principle the lqa package can be used to boost penalized GLMs based on the LQA algorithm However in such a case it is not advisable to use just one iteration of P IRLS since the approximated penalty matrix A will then strongly depend on the control parameter c Consequently the boosting methods must be adjusted then We will not follow this idea but leave it for further investigations 4 Some Applications of the lqa package There are three main important functions in the lqa package The function 1qa can be used to fit a penalized GLM for an already specified tuning parameter The function cv lga finds an optimal tuning parameter in up to three dimensions The plot 1qa function can be applied to visualize the solution path via coefficient build ups For lga and cv lga objects as returned by the corresponding functions there exist summary functions that give a short and compact overview on the computed results But now we are going into details on how to apply these functions 4 1 Using the lqa function When the tuning parameter is already specified we can use the function 1qa to fit a penalized GLM by the LQA algorithm or some boosting algorithms with variable selection sc
17. e containing the variables in the model If not found in data the variables are taken from environment formula typically the environment from which the function 1ga is called method a character indicating the fitting method The default value method lqa update2 applies the LQA algorithm standardize a Boolean variable whether the regressors should be standardized this is recommended or not The default value is standardize TRUE further arguments passed to or from other methods This call returns an object obj of the class 1ga which moreover inherits the class attributes glm and 1m As a consequence we could apply to obj the typical methods as for instances of the class 1m Some of them are modified in order to meet the special needs of lqa instances see below For illustration consider the following small example We simulate a GLM hi h n t 1 5 0 with n 100 observations and p 5 covariates where the predictor n bo x 6 consists of the true parameters 69 0 and B 1 2 0 0 1 The original design matrix not regarding the intercept consists of five univariate stan dard normally distributed covariates x 1 X 5 In order to get a stochastic depen dency structure we replace X 2 and x 3 by X 1 where some additional noise is applied to obtain correlations near one but not exactly equal to one As a result we get a simulation 4 2 Basic principles of using the cv lqa function 15 setting
18. e we use add MLE FALSE The resulting function call is R gt plot lga y x family binomial penalty family fused lasso offset values c NA 0 01 add MLE FALSE The corresponding plot is given in Figure 1 Note that the grouping effect does not cover 73 This is due to the small weight A2 0 01 on the second penalty term that is responsible for the fusion of correlated regressors Note that the grouping effect among x and z is kept all the time Furthermore the relevance of the regressors is recognized as 24 is selected to join the active set at last References Bondell H D and B J Reich 2008 Simultaneous regression shrinkage variable selec tion and clustering of predictors with oscar Biometrics 64 115 123 Chambers J 1998 Programming with Data A Guide to the S Language New York Springer Fahrmeir L T Kneib and S Lang 2009 Regression Modelle Methoden und An wendungen 2nd ed Berlin Springer Fahrmeir L and G Tutz 2001 Multivariate Statistical Modelling based on Generalized Linear Models 2nd ed New York Springer Frank I E and J H Friedman 1993 A statistical view of some chemometrics regression tools with discussion Technometrics 35 109 148 Hastie T R Tibshirani and J H Friedman 2009 The Elements of Statistical Learning 2nd ed New York Springer REFERENCES 23 Iqa coefficient built up fused lasso 9 Q 2 2 ja
19. eady included in x train or not The function adjusts it if necessary a list containing the tuning parameter candidates The number of list elements must be correspond to the dimension of the tuning parameter See details below identifies the exponential family of the response and the link function of the model See the description of the R function family for further details a function or character argument identifying the penalty family See details below logical If standardize TRUE the data are standardized this is recommended number of folds in cross validation This can be omitted if a validation set is used optional list containing the observation indices used in the particular cross validation folds This can be omitted if a validation set is used a character indicating the loss function to be used in evaluating the model performance for the tuning parameter candidates If loss func NULL the aic loss function will be used See details below a list of parameters for controlling the fitting process See the documentation of 1qa control for details 4 2 Basic principles of using the cv lqa function 17 Further arguments This function can be used for evaluating model performance for different tuning param eter candidates If you just give training data a cross validation will be applied If you additionally provide validation data then those data will be used for measuring the performance and the
20. enalty objects Note that the penalty objects only consider the p regressors Their methods do not take into consideration whether or not there is an intercept included in the model An adjustment of the ap proximated penalty matrix A to mm 0 Ay will be done automatically by the function get Amat from the lqa package This function is the conjunction between penalty objects and the algorithms that require A From a numerical point of view a direct computation of the approximated penalty matrix via J 1 J Ay _ Pj 1a 6l aa 8 J j l alB c i is often not efficient As we have seen above in some cases we can compute A directly using the getpenmat function If we apply formula 8 directly we would need to compute J outer products based on the a that is we had to store J matrices each of dimension p x p If the coefficients a are sparse as for the fused lasso penalty those matrices would contain just a few non zero components to be concrete e g four non zero elements for the fused lasso penalty no matter how much regressors are in the model With increasing p this becomes more inefficient yet cumbersome A numerically more efficient method would extract the non zero elements of each a compute the outer products of those subvectors and record the resulting submatrices at the corresponding positions of a p x p dimensional working penalty matrix If some positions overlap 10 between the J penalty terms then
21. eter candidates for the dimension you want to plot a vector of the same dimension as your tuning parameter At the position of the dimension you want to plot there must be entry NA The other positions should be filled with given and fixed tuning parameter values as e g returned optimized values from cv lqa See examples below logical If show standardize TRUE the standardized coefficients are plotted otherwise the unstandardized coefficients are plotted logical If add MLE TRUE the unrestricted MLE is also plotted Note this only works for n gt p settings Otherwise this argument is set to FALSE automatically REFERENCES 22 control list of control parameters as returned by lga control See the Iqa manual Ulbricht 2010a for details further arguments This function plots the coefficient build ups for a given dimension of your tuning param eter s The argument lambdaseq can be omitted In this case a default sequence R gt lambdaseq lt exp seq 10 6 length 60 is used If your penalty consists of more than one tuning parameter you must identify the relevant dimension to plot using offset values where you state the fixed values for the other tuning parameters We want to plot the coefficient build up for our simulated example with fused lasso penalty Using the results from the last section we set Az 0 01 For this value held fixed the coefficients cannot converge to the MLE when A 0 Therefor
22. gained by the call summary lqa obj 4 2 Basic principles of using the cv lqaQ function Normally we do not know the optimal tuning parameter a priori It is more common to apply model selection via a set of tuning parameter candidates This will be done by cross validation In the lqa package the function cv 1qa is designed for this purpose The usage of it is 4 2 Basic principles of using the cv lqa function 16 cv lqa y train x train intercept TRUE y vali NULL x vali NULL lambda candidates family penalty family standardize loss func with input parameters y train x train intercept y vali x vali lambda candidates family penalty family standardize n fold cv folds loss func control TRUE n fold cv folds aic loss control lga control the vector of response training data the design matrix of training data If intercept TRUE then it does not matter whether a column of ones is already included in x train or not The function adjusts it if necessary logical If intercept TRUE then an intercept is included in the model this is recommended an additional vector of response validation data If given the validation data are used for evaluating the loss function an additional design matrix of validation data If given the validation data are used for evaluating the loss function If intercept TRUE then it does not matter whether a column of ones is alr
23. he function cv G the number of folds K is maintained in the argument n fold Let by 3 B oo k 1 K denote the estimate during the cross validation for a given value of tuning parameter A The in dex k indicates that by has been estimated based on the cross validation index subset i 1 n K i k that is all of the training data except the subset that belongs to the k th partition Let jf h Bx x By denote the estimated response of the i th observation For the deviance we write Dev 2 gt D fal li Dmax amp b A eee a i n i k and denote Hy as the hat matrix corresponding to Be This deviance and the hat matrix are important ingredients for most of the loss functions used to evaluate the model 4 2 Basic principles of using the cv lqa function 18 with tuning parameter A Using a loss function L y fi the cross validation function is 15 So Liep Ko 1 i k i k This definition varies from the typical ones such as 7 49 in Hastie et al 2009 p 242 But from a computational point of view it is more well arranged This will become clear hopefully in the examples below The loss functions already implemented in the lqa package are summarized in Table 4 Criterion Formula Function name AIC AIC A Dev 2 tr H aic loss BIC BIC A Dev log n tr Hy bic loss GCV GCV A n Dev n tr Hy gcv loss squared loss SL A X y
24. hemes The main working function lqa update2 computes the LQA updates using a Cholesky decomposition of X WX A With n observations and p p 1 coefficients including p regressors and one intercept the Cholesky decomposition requires p np 2 operations The Cholesky decomposition is usually fast depending on the relative size of n and p but it can be less numerically stable Lawson and Hanson 1974 Due to the arguments of 1ga formula and lga default the syntax of a call to 1qa should look like this R gt obj lt lqa formula family penalty data method standardize 4 1 Using the lqa function 14 with input parameters formula a symbolic description of the model to be fit A typical formula has the form response terms where response is the numeric response vector and terms is a series of terms which specifies a linear predictor for response Per default an intercept is included in the model If it should be removed then use formulae of the form response 0 terms or response terms 1 family a description of the error distribution and link function to be used in the model This can be a character string naming a family function a family function or the result of a call to a family function See the R function family for details of family functions penalty a description of the penalty to be used in the fitting procedure This must be an object of the class penalty data an optional data fram
25. ject that might be kept all the time If you want to implement a new penalty object then you should respect the following annotations The object is initialized by calling the function of its name with lambda as argument If your tuning parameter is a vector then lambda must be a vector too You could provide some additional arguments if necessary such as control parameters for numerical precision The lqa control parameters such as the return of a call to the function lqa control are not meant by this They are incorporated if necessary in the particular penalty methods such as the c1 argument that corresponds with c in 7 in the getpenmat function If your penalty depends on the regressor matrix or its correlation matrix as e g the correlation based penalty then it is more reasonable to use the corresponding argument e g the x argument just in the particular functions such as getpenmat This assures to relate to the right subset of regressors when indicated Otherwise x would be initialized as a global argument of your penalty and hence can cause trouble if you want to apply methods with explicit variable selection such as componentwise boosting But now we come back to the basic structure In a first step you should call lambda check The internal function lambda check just checks the lambda ar gument on existence is nul1 and element wise nonnegativity In a second step you should check the right dimensionality of the lamb
26. ments Furthermore the clear specification of S4 classes is another drawback concerning our applications When dealing with quadratic penalties P B TSTM 6 where M is a positive definite matrix we would just need the class charac ter penalty penalty to indicate the name of the penalty and the function penalty getpenmat The latter returns the evaluated p x p dimensional penalty matrix M based on the argument beta if necessary Contrary polytopes as penalties might require some more advanced functions such as penalty first derivative or penalty a coefs see Table 1 below to evaluate the penalty for a given 8 vector of coefficients If a user wants to implement a new quadratic penalty then these further functions are actually not needed However when using S4 methods this would result in an error message or alternatively we would need two classes quad penalty and penalty instead of just one where quad penalty is a subclass of penalty For these reasons we use the S3 methods concept and put up with the in some way less comfortable implemen tation of new penalty instances The penalties already implemented as penalty objects are listed in Table 2 In the following we describe what kinds of methods are required for specific penalty families The five basic methods for penalty objects are summarized in Table 1 The methods penalty and lambda are mandatory They are necessary to identify the penalty family and respectively
27. n before lqa default is called As a conclusion for a user it is more comfortable to enter at the lowest possible level Nevertheless the big advantage from using S3 methods is that the user simply calls 1qa with either one of the possible representations as the argument The internal dispatching method will find the class of the object and apply the right method Note that just looking at the syntax of the function call for the class formula is misleading for applications As the function 1qa formula is primarily meant to provide the data included in the formula environment there are no input arguments necessary concerning the penalty or the fitting method But those are to be specified for higher level methods Therefore please use the examples at the next section as a guide to your applications to the 1qaQ function As you can see in Table 3 there are two main functions involved in the parameter estima tion The first is the default function lqa default This function can be interpreted 11 Level Method Tasks I lgaQ e dispatching II lga formula q e extract the data from the formula environment e call the default method III lga default f e check for the exponential family and link func tion in the family argument e check the penalty argument e check for existence of the fitting method e check for model consistency e g a column of ones must be included in the x argument if an intercept is present in
28. n the term of slots This leads to a huge amount of formality and precision The structure of an S3 object can only be defined implicitly using the function structure As a direct consequence checks on data structures or arguments can be swapped out to the initialization or prototype function This possibility is only of limited merit when dealing with penalties due to a huge amount of heterogeneity among different penalties Reasons for it are e g a different number of tuning parameters or that the computation of the penalty matrix can depend on the regressor matrix X as for the correlation based penalty Tutz and Ulbricht 2009 or just on dimension p and information whether or not there is an intercept included in the model ridge penalty in linear models Consequently checks on correct specification must also be customized and hence included in the several penalty objects Method Type Input arguments Output penalty character a character containing the penalty name lambda numeric vector a numeric vector contain ing the tuning parame ters NULL pxp penalty matrix eval uated at beta if neces sary NULL J dimensional vector containing p 1 PAJ NULL p xXx J matrix contain ing the coefficients al ay getpenmat function beta first derivative function beta a coefs function beta Table 1 Methods operating on objects of class penalty The dot operator allows for some further argu
29. package the fitting method should always return a list that contains at least the following arguments coefficients the vector of the standardized estimated coefficients tr H the trace of the hat matrix Amat the penalty matrix from the last iteration converged a logical variable This should be TRUE if the algorithm indeed converged stop at the number of iterations until convergence m stop the number of iterations until AIC reaches its minimum The last argument m stop is only necessary for fitting algorithms where the iteration of optimal stopping is less than the number of iterations until convergence This is especially related to boosting methods 13 The R package mboost Hothorn et al 2009 provides a framework for the application of functional gradient descent algorithms boosting for the optimization of general loss functions The package is especially designed to utilize componentwise least squares as base learners An alternative to directly implement ForwardBoost and GBlockBoost in the scope of the lqa package has been to incorporate the infrastructure of mboost through a kind of interface In this case ForwardBoost and GBlockBoost must have been implemented as new mboost functions such as the already existing ones gamboost or glmboost See Hothorn et al 2009 for details Furthermore in order to fulfill the requirements of the output arguments of fitting methods for lqa as described above we would need to writ
30. r the LQA algorithm A concrete penalty consists of the penalty family and chosen tuning parameters It will be reasonable to specify a penalty family and the tuning parameters separately This is especially necessary for cross validation procedures and plotting of coefficient build ups So the R source code for specifying e g the ridge penalty with tuning parameter 0 7 should look like this R gt penalty lt ridge lambda 0 7 The entities R operates on are technically known as objects Hence we require a useful R object to represent penalties Many programming languages including R and S use concepts from object oriented programming OOP As given in Leisch 2008 the two main of those concepts as used in R are e classes to define how objects of a certain type look like and e methods to define special functions operating on objects of a certain class For dealing with several penalties also user defined ones we introduce a class of R objects called penalty There are two approaches in R for dealing with classes that is the older S3 approach and the 4 approach as introduced in Chambers 1998 Most classes in R are based on the classical 3 concept The more advanced S4 concept requires some more code but delivers more flawless definitions However both concepts have pros and cons Some more details and examples are e g given in Venables and Ripley 2000 In S4 classes the structure of an object is clearly specified i
31. the LQA algorithm The 2 at the end of its name has 12 just some historical reasons e ForwardBoost to compute the ForwardBoost algorithm e GBlockBoost to compute the GBlockBoost algorithm e GBlockBoost combined with the argument componentwise TRUE computes com ponentwise boosting If you want to implement your own fitting method then you should remind of the following aspects The basic call of a fitting method with name method name will be method name x y family NULL penalty NULL intercept TRUE control where the input arguments are x the standardized design matrix This will usually include a column of ones if an intercept should be included in the model y the vector of observed response values family a description of the error distribution and link function to be used in the model This can be a character string naming a family function a family function or the result of a call to a family function penalty a description of the penalty to be used in the fitting procedure intercept a logical object whether the model should include an intercept this is recommended or not The default value is TRUE control a list of parameters for controlling the fitting process further arguments The fitting methods have access to the environment of 1qga default So in principle you can pass additional arguments from all lower level 1ga function calls To be in line with the other functions in the
32. the model standardizes the data call the fitting method transform the estimated coefficients back compute important statistics deviance AIC BIC residuals estimated predictors 7 and re sponses etc IV fitting method e computation of estimates Table 3 Implemented basic methods and their tasks in the lqa package as a principal or advisor of the estimation process Its task is to check whether the in put arguments are in order or not and to standardize the data if required Afterwards it calls the fitting method and transforms the returned estimated coefficients back to the original scales of the regressors Finally it computes some important statistics such as the deviance the information criteria AIC and BIC the estimated predictors 7 and responses ft At last the default function assigns the object to be returned to the lgqa class Furthermore it inherits the affiliation to the glm and 1m classes Thereby we could use generic functions of those classes such as coef to extract the estimated coefficients from an 1ga object The other function or more precicely the other set of functions mainly involved in pa rameter estimation are the fitting methods They can be interpreted as the workhorses or the agents of the package Their task is to compute the estimates and some other statistics such as the trace of the hat matrix see below Up to now the following fitting methods are implemented e lga update2 to compute
33. the recording is additive Consequently the required memory will be reduced and the speed of computation increases This improving principle is used by the get Amat function 3 Basic structure of the lqa package In this section we describe the basic structure of the lqa package As illustrated in Table 3 there is an elemental hierarchy of four methods or classes of methods As for the implementation of the penalties and their related methods we use the R concept of S3 classes At the lowest level there is the generic function 1qa Its task is to determine the class of its arguments and to use this information to select an appropriate method Therefore they are also called dispatching methods The function calls of the methods on the next two levels are S3 method for class formula lga formula data list weights rep 1 nobs subset na action start NULL etastart mustart offset at level II and Default 53 method lga x y family gaussian penalty NULL method lqa update2 weights rep 1 nobs start NULL etastart NULL mustart NULL offset rep 0 nobs control lgqa control intercept TRUE standardize TRUE at level III As you can see the S3 method for class formula requires less arguments than the default S3 methods at the next level This is due to the extraction and translation of information from the formula environment in the 1qa formula functio
34. training data are completely used for model fitting You must specify a penalty family This can be done by giving its name as a character e g penalty family lasso or as a function call e g penalty family lasso The tuning parameter candidates are given in the argument lambda candidates Usually one should a priori generate a sequence of equidistant points and then use this as exponent to Euler s number such as R gt lambdaseq lt exp seq 4 4 length 11 R gt lambdaseq 1 0 0183156 0 0407622 0 0907179 0 2018965 0 4493289 1 0000000 7 2 2255409 4 9530324 11 0231763 24 5325302 54 5981500 Note that lambda candidates must be a list in order to cope with different numbers of candidates e g lambda candidates list lambdaseq For evaluation you must specify a loss function The default value is aic loss e g the AIC will be used to find an optimal tuning parameter Other already implemented loss functions are bic loss gcv loss squared loss quadratic loss function dev loss deviance as loss function In the following we explain the basic principles of the application of loss functions in the lqa package Before we start we introduce some notation to describe the cross validation procedure in general Let K denote the number of cross validation folds and 1 n gt 1 K is an indexing function that indicates the partition to which the i th obser vation i 1 n is allocated In t
35. wider range of penalties such as polytopes and quadratic penalties If the penalty region is a polytope then all penalty functions py must be linear Note that the sum of all J penalty functions determines the penalty term The penalty level A must not be identical for all J functions but to keep the notation simple we omit a further subindex 7 Furthermore the penalty function type must not be identical for all 7 The number J of penalty functions is not necessarily equal to p the number of dimensions For example the fused lasso penalty Tibshirani et al 2005 Pp Pp P B X Bj A2 lAr brl j 1 k 2 can be written as 2p 1 PP palaz 6 3 where prg Arla BI J l p and a 0 0 1 0 0 with a one at the j th position and pyj Ala Bl j pt l 2p 1 where a 0 0 1 1 0 0 with a one at the j p 1 th position and a minus one at the j p th position The parameters and z correspond to s and s2 in the constrained regression problem formulation see Tibshirani et al 2005 Ulbricht 2010b has shown that if the penalty term has structure 2 then the penalized regression problem 1 can be solved by a Newton type algorithm based on local quadratic approximations of the penalty term That is the unknown coefficients b can be estimated iteratively with the estimation equation l b1 bay 7 F b A3 S bay Ab 4 where s b and F

User's Guide to lqa

Contents

Download Pdf Manuals

Related Search

Related Contents