Title: | Boosting Regression Quantiles |
---|---|
Description: | Boosting Regression Quantiles is a component-wise boosting algorithm, that embeds all boosting steps in the well-established framework of quantile regression. It is initialized with the corresponding quantile, uses a quantile-specific learning rate, and uses quantile regression as its base learner. The package implements this algorithm and allows cross-validation and stability selection. |
Authors: | Stefan Linner [aut, cre, cph] |
Maintainer: | Stefan Linner <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2024-11-01 03:23:53 UTC |
Source: | https://github.com/stefanlinner/boostrq |
Updating number of iterations
## S3 method for class 'boostrq' x[i, return = TRUE, ...]
## S3 method for class 'boostrq' x[i, return = TRUE, ...]
x |
a boostrq object |
i |
desired number of boosting iterations |
return |
TRUE, if the result should be returned |
... |
additional arguments passed to callies |
a boostrq object with the updated number of iterations
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) boosted.rq[500]
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) boosted.rq[500]
Component-wise functional gradient boosting algorithm to fit a quantile regression model.
boostrq( formula, data, mstop = 100, nu = NULL, tau = 0.5, offset = NULL, weights = NULL, oobweights = NULL, risk = "inbag", digits = 10, exact.fit = FALSE )
boostrq( formula, data, mstop = 100, nu = NULL, tau = 0.5, offset = NULL, weights = NULL, oobweights = NULL, risk = "inbag", digits = 10, exact.fit = FALSE )
formula |
a symbolic description of the model to be fit. |
data |
a data frame (or data.table) containing the variables stated in the formula. |
mstop |
number of iterations, as integer |
nu |
learning rate, as numeric |
tau |
quantile parameter, as numeric |
offset |
a numeric vector used as offset. |
weights |
(optional) a numeric vector indicating which weights to used in the fitting process (default: all observations are equally weighted, with 1). |
oobweights |
an additional vector of out-of-bag weights, which is used for the out-of-bag risk. |
risk |
string indicating how the empirical risk should be computed for each boosting iteration. inbag leads to risks computed for the learning sample (i.e. observations with non-zero weights), oobag to risks based on the out-of-bag (i.e. observations with non-zero oobagweights). |
digits |
number of digits the slope parameter different from zero to be considered the best-fitting component, as integer. |
exact.fit |
logical, if set to TRUE the negative gradients of exact fits are set to 0. |
A (generalized) additive quantile regression model is fitted using the boosting regression quantiles algorithm, which is a functional component-wise boosting algorithm. The base-learner can be specified via the formula object. brq (linear quantile regression) and brqss(nonlinear quantile regression) are available base-learner.
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) boosted.rq$mstop() boosted.rq$selection.freqs() boosted.rq$coef() boosted.rq$risk()
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) boosted.rq$mstop() boosted.rq$selection.freqs() boosted.rq$coef() boosted.rq$risk()
Base-learner for linear quantile regression.
brq(formula, method = "fn")
brq(formula, method = "fn")
formula |
a symbolic description of the base learner. |
method |
the algortihm used to fit the quantile regression, the default is set to "fn", referring to the Frisch-Newton inferior point method. For more details see the documentation of quantreg::rq. |
brq returns a string, which is used to specifiy the formula in the fitting process.
brq(cyl * hp)
brq(cyl * hp)
estimated coefficients of boosting regression quantiles
## S3 method for class 'boostrq' coef(object, which = NULL, aggregate = "sum", ...)
## S3 method for class 'boostrq' coef(object, which = NULL, aggregate = "sum", ...)
object |
object of class boostrq |
which |
a subset of base-learners |
aggregate |
a character specifying how to aggregate coefficients of single base learners. The default returns the coefficient for the final number of boosting iterations. "cumsum" returns a list with matrices (one per base-learner) with the cumulative coefficients for all iterations. "none" returns a list of matrices where the jth columns of the respective matrix contains coefficients of the base-learner of the jth boosting iteration.v "sum_aggr" ... |
... |
additional arguments passed to callies |
coef extracts the regression coefficients of the fitted boostrq model.
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) coef(boosted.rq, aggregate = "cumsum")
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) coef(boosted.rq, aggregate = "cumsum")
Crossvalidation for boostrq
## S3 method for class 'boostrq' cvrisk( object, folds = mboost::cv(object$weights, type = "kfold"), grid = 0:mstop(object), papply = parallel::mclapply, mc.preschedule = FALSE, fun = NULL, ... )
## S3 method for class 'boostrq' cvrisk( object, folds = mboost::cv(object$weights, type = "kfold"), grid = 0:mstop(object), papply = parallel::mclapply, mc.preschedule = FALSE, fun = NULL, ... )
object |
a boostrq object |
folds |
a matrix indicating the weights for the k resampling iterations |
grid |
a vetor of stopping parameters the empirical quantile risk is to be evaluated for. |
papply |
(parallel) apply function, defaults to mclapply. To run sequentially (i.e. not in parallel), one can use lapply. |
mc.preschedule |
preschedule tasks if are parallelized using mclapply (default: FALSE)? For details see mclapply. |
fun |
if fun is NULL, the out-of-sample risk is returned. fun, as a function of object, may extract any other characteristic of the cross-validated models. These are returned as is. |
... |
additional arguments passed to callies |
Cross-validated Boosting regression quantiles
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) set.seed(101) cvk.out <- cvrisk( boosted.rq, grid = 0:mstop(boosted.rq), folds = mboost::cv(boosted.rq$weights, type = "kfold", B = 5) ) cvk.out plot(cvk.out) mstop(cvk.out) boosted.rq[mstop(cvk.out)]
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) set.seed(101) cvk.out <- cvrisk( boosted.rq, grid = 0:mstop(boosted.rq), folds = mboost::cv(boosted.rq$weights, type = "kfold", B = 5) ) cvk.out plot(cvk.out) mstop(cvk.out) boosted.rq[mstop(cvk.out)]
fitted values of boosting regression quantiles
## S3 method for class 'boostrq' fitted(object, ...)
## S3 method for class 'boostrq' fitted(object, ...)
object |
object of class boostrq |
... |
additional arguments passed to callies |
fitted returns the fitted values of the fitted boostrq model.
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) fitted(boosted.rq)
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) fitted(boosted.rq)
Current number of iterations of boostrq
## S3 method for class 'boostrq' mstop(object, ...)
## S3 method for class 'boostrq' mstop(object, ...)
object |
a boostrq object |
... |
additional arguments passed to callies |
current number of boosting iterations
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) mstop(boosted.rq)
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) mstop(boosted.rq)
Model predictions for boosting regression quantiles
## S3 method for class 'boostrq' predict(object, newdata = NULL, which = NULL, aggregate = "sum", ...)
## S3 method for class 'boostrq' predict(object, newdata = NULL, which = NULL, aggregate = "sum", ...)
object |
a boostrq object |
newdata |
a data.frame (or data.table) including all covariates contained in the baselearners |
which |
a subset of base-learners |
aggregate |
a character specifying how to aggregate coefficients of single base learners. The default returns the coefficient for the final number of boosting iterations. "cumsum" returns a list with matrices (one per base-learner) with the cumulative coefficients for all iterations. "none" returns a list of matrices where the jth columns of the respective matrix contains coefficients of the base-learner of the jth boosting iteration. |
... |
additional arguments passed to callies |
predictions for the new data
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) predict.data <- data.frame(hp = 165, cyl = 6, am = 1, wt = 3.125) predict(boosted.rq, newdata = predict.data)
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) predict.data <- data.frame(hp = 165, cyl = 6, am = 1, wt = 3.125) predict(boosted.rq, newdata = predict.data)
printing boosting regression quantiles
## S3 method for class 'boostrq' print(x, ...)
## S3 method for class 'boostrq' print(x, ...)
x |
object of class boostrq |
... |
additional arguments passed to callies |
print shows a dense representation of the boostrq model fit.
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) boosted.rq
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) boosted.rq
Print result summaries for a boostrq object
## S3 method for class 'summary.boostrq' print(x, ...)
## S3 method for class 'summary.boostrq' print(x, ...)
x |
a summary.boostrq object |
... |
additional arguments passed to callies |
printing the result summaries for a boostrq object including the print-information, estimated coefficients, and selection frequencies
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) summary(boosted.rq)
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) summary(boosted.rq)
residuals of boosting regression quantiles
## S3 method for class 'boostrq' residuals(object, ...)
## S3 method for class 'boostrq' residuals(object, ...)
object |
object of class boostrq |
... |
additional arguments passed to callies |
residuals returns the residuals of the fitted boostrq model.
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) residuals(boosted.rq)
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) residuals(boosted.rq)
Empirical Quantile Risk of boostrq Object
## S3 method for class 'boostrq' risk(object, ...)
## S3 method for class 'boostrq' risk(object, ...)
object |
a boostrq object |
... |
additional arguments passed to callies |
numeric vector containing the respective empirical quantile risk of the different boosting iterations.
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) risk(boosted.rq)
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) risk(boosted.rq)
Extract indices of selected base learners
## S3 method for class 'boostrq' selected(object, ...)
## S3 method for class 'boostrq' selected(object, ...)
object |
a boostrq object |
... |
additional arguments passed to callies |
an index vector indicating the selected base learner in each iteration
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) selected(boosted.rq)
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) selected(boosted.rq)
Stability Selection for boosting regression quantiles
## S3 method for class 'boostrq' stabsel( x, cutoff, q, PFER, grid = 0:mstop(x), folds = stabs::subsample(x$weights, B = B), B = ifelse(sampling.type == "MB", 100, 50), assumption = "unimodal", sampling.type = "SS", papply = parallel::mclapply, verbose = TRUE, ... )
## S3 method for class 'boostrq' stabsel( x, cutoff, q, PFER, grid = 0:mstop(x), folds = stabs::subsample(x$weights, B = B), B = ifelse(sampling.type == "MB", 100, 50), assumption = "unimodal", sampling.type = "SS", papply = parallel::mclapply, verbose = TRUE, ... )
x |
a fitted model of class "boostrq" |
cutoff |
cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used |
q |
number of (unique) selected componenents (base-learners) that are selected in each subsample. |
PFER |
upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. |
grid |
a numeric vector of the form 0:m. |
folds |
a weight matrix with number of rows equal to the number of observations. Usually one should not change the default here as subsampling with a fraction of 1/2 is needed for the error bounds to hold. |
B |
umber of subsampling replicates. Per default, we use 50 complementary pairs for the error bounds of Shah & Samworth (2013) and 100 for the error bound derived in Meinshausen & Buehlmann (2010). As we use B complementray pairs in the former case this leads to 2B subsamples. |
assumption |
Defines the type of assumptions on the distributions of the selection probabilities and simultaneous selection probabilities. Only applicable for sampling.type = "SS". For sampling.type = "MB" we always use code"none". |
sampling.type |
use sampling scheme of of Shah & Samworth (2013), i.e., with complementarty pairs (sampling.type = "SS"), or the original sampling scheme of Meinshausen & Buehlmann (2010). |
papply |
(parallel) apply function, defaults to mclapply. To run sequentially (i.e. not in parallel), one can use lapply. |
verbose |
logical (default: TRUE) that determines wether warnings should be issued. |
... |
additional arguments passed to callies |
An object of class stabsel.
boosted.rq <- boostrq( formula = mpg ~ brq(cyl) + brq(hp) + brq(am) + brq(wt) + brq(drat), data = mtcars, mstop = 600, nu = 0.1, tau = 0.5 ) stabsel_parameters( q = 3, PFER = 1, p = 5, sampling.type = "SS", assumption = "unimodal" ) set.seed(100) brq.stabs <- stabsel( x = boosted.rq, q = 3, PFER = 1, sampling.type = "SS", assumption = "unimodal" ) brq.stabs
boosted.rq <- boostrq( formula = mpg ~ brq(cyl) + brq(hp) + brq(am) + brq(wt) + brq(drat), data = mtcars, mstop = 600, nu = 0.1, tau = 0.5 ) stabsel_parameters( q = 3, PFER = 1, p = 5, sampling.type = "SS", assumption = "unimodal" ) set.seed(100) brq.stabs <- stabsel( x = boosted.rq, q = 3, PFER = 1, sampling.type = "SS", assumption = "unimodal" ) brq.stabs
Result summaries for a boostrq object
## S3 method for class 'boostrq' summary(object, ...)
## S3 method for class 'boostrq' summary(object, ...)
object |
a boostrq object |
... |
additional arguments passed to callies |
result summaries for a boostrq object including the print-information, estimated coefficients, and selection frequencies
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) summary(boosted.rq)
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) summary(boosted.rq)
Update and Re-fit a boostrq model
## S3 method for class 'boostrq' update(object, weights, oobweights, risk, ...)
## S3 method for class 'boostrq' update(object, weights, oobweights, risk, ...)
object |
a boostrq object |
weights |
(optional) a numeric vector indicating which weights to used in the fitting process (default: all observations are equally weighted, with 1). |
oobweights |
an additional vector of out-of-bag weights, which is used for the out-of-bag risk. |
risk |
string indicating how the empirical risk should be computed for each boosting iteration. inbag leads to risks computed for the learning sample (i.e. observations with non-zero weights), oobag to risks based on the out-of-bag (i.e. observations with non-zero oobagweights). |
... |
additional arguments passed to callies |
a re-fitted boostrq model
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) update( boosted.rq, weights = c(rep(1, 30), 0, 0), oobweights = c(rep(0, 30), 1,1), risk = "oobag" )
boosted.rq <- boostrq( formula = mpg ~ brq(cyl * hp) + brq(am + wt), data = mtcars, mstop = 200, nu = 0.1, tau = 0.5 ) update( boosted.rq, weights = c(rep(1, 30), 0, 0), oobweights = c(rep(0, 30), 1,1), risk = "oobag" )