Abstract:
Conformal predictors, introduced by Vovk et al. (2005), serve to build prediction intervals by exploiting a notion of conformity of the new data point with previously observed data. In the present paper, we propose a novel method for constructing prediction intervals for the response variable in multivariate linear models. The main emphasis is on sparse linear models, where only few of the covariates have significant influence on the response variable even if their number is very large. Our approach is based on combining the principle of conformal prediction with the $\ell_1$ penalized least squares estimator (LASSO). The resulting confidence set depends on a parameter $\epsilon>0$ and has a coverage probability larger than or equal to $1-\epsilon$. The numerical experiments reported in the paper show that the length of the confidence set is small. Furthermore, as a by-product of the proposed approach, we provide a data-driven procedure for choosing the LASSO penalty. The selection power of the method is illustrated on simulated data.

Abstract:
We consider the linear regression problem. We propose the S-Lasso procedure to estimate the unknown regression parameters. This estimator enjoys sparsity of the representation while taking into account correlation between successive covariates (or predictors). The study covers the case when $p\gg n$, i.e. the number of covariates is much larger than the number of observations. In the theoretical point of view, for fixed $p$, we establish asymptotic normality and consistency in variable selection results for our procedure. When $p\geq n$, we provide variable selection consistency results and show that the S-Lasso achieved a Sparsity Inequality, i.e., a bound in term of the number of non-zero components of the oracle vector. It appears that the S-Lasso has nice variable selection properties compared to its challengers. Furthermore, we provide an estimator of the effective degree of freedom of the S-Lasso estimator. A simulation study shows that the S-Lasso performs better than the Lasso as far as variable selection is concerned especially when high correlations between successive covariates exist. This procedure also appears to be a good challenger to the Elastic-Net (Zou and Hastie, 2005).

Abstract:
We consider the linear regression problem, where the number $p$ of covariates is possibly larger than the number $n$ of observations $(x_{i},y_{i})_{i\leq i \leq n}$, under sparsity assumptions. On the one hand, several methods have been successfully proposed to perform this task, for example the LASSO or the Dantzig Selector. On the other hand, consider new values $(x_{i})_{n+1\leq i \leq m}$. If one wants to estimate the corresponding $y_{i}$'s, one should think of a specific estimator devoted to this task, referred by Vapnik as a "transductive" estimator. This estimator may differ from an estimator designed to the more general task "estimate on the whole domain". In this work, we propose a generalized version both of the LASSO and the Dantzig Selector, based on the geometrical remarks about the LASSO in pr\'evious works. The "usual" LASSO and Dantzig Selector, as well as new estimators interpreted as transductive versions of the LASSO, appear as special cases. These estimators are interesting at least from a theoretical point of view: we can give theoretical guarantees for these estimators under hypotheses that are relaxed versions of the hypotheses required in the papers about the "usual" LASSO. These estimators can also be efficiently computed, with results comparable to the ones of the LASSO.

Abstract:
We focus on the high dimensional linear regression $Y\sim\mathcal{N}(X\beta^{*},\sigma^{2}I_{n})$, where $\beta^{*}\in\mathds{R}^{p}$ is the parameter of interest. In this setting, several estimators such as the LASSO and the Dantzig Selector are known to satisfy interesting properties whenever the vector $\beta^{*}$ is sparse. Interestingly both of the LASSO and the Dantzig Selector can be seen as orthogonal projections of 0 into $\mathcal{DC}(s)=\{\beta\in\mathds{R}^{p},\|X'(Y-X\beta)\|_{\infty}\leq s\}$ - using an $\ell_{1}$ distance for the Dantzig Selector and $\ell_{2}$ for the LASSO. For a well chosen $s>0$, this set is actually a confidence region for $\beta^{*}$. In this paper, we investigate the properties of estimators defined as projections on $\mathcal{DC}(s)$ using general distances. We prove that the obtained estimators satisfy oracle properties close to the one of the LASSO and Dantzig Selector. On top of that, it turns out that these estimators can be tuned to exploit a different sparsity or/and slightly different estimation objectives.

Abstract:
Transductive methods are useful in prediction problems when the training dataset is composed of a large number of unlabeled observations and a smaller number of labeled observations. In this paper, we propose an approach for developing transductive prediction procedures that are able to take advantage of the sparsity in the high dimensional linear regression. More precisely, we define transductive versions of the LASSO and the Dantzig Selector . These procedures combine labeled and unlabeled observations of the training dataset to produce a prediction for the unlabeled observations. We propose an experimental study of the transductive estimators, that shows that they improve the LASSO and Dantzig Selector in many situations, and particularly in high dimensional problems when the predictors are correlated. We then provide non-asymptotic theoretical guarantees for these estimation methods. Interestingly, our theoretical results show that the Transductive LASSO and Dantzig Selector satisfy sparsity inequalities under weaker assumptions than those required for the "original" LASSO.

Abstract:
Confident prediction is highly relevant in machine learning; for example, in applications such as medical diagnoses, wrong prediction can be fatal. For classification, there already exist procedures that allow to not classify data when the confidence in their prediction is weak. This approach is known as classification with reject option. In the present paper, we provide new methodology for this approach. Predicting a new instance via a confidence set, we ensure an exact control of the probability of classification. Moreover, we show that this methodology is easily implementable and entails attractive theoretical and numerical properties.

Abstract:
We study how correlations in the design matrix influence Lasso prediction. First, we argue that the higher the correlations are, the smaller the optimal tuning parameter is. This implies in particular that the standard tuning parameters, that do not depend on the design matrix, are not favorable. Furthermore, we argue that Lasso prediction works well for any degree of correlations if suitable tuning parameters are chosen. We study these two subjects theoretically as well as with simulations.

Abstract:
Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter leads to a nearly optimal prediction performance of the Lasso even for highly correlated covariates. However, we also reveal that for moderately correlated covariates, the prediction performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. For the illustration of our approach with an important application, we deduce nearly optimal rates for the least-squares estimator with total variation penalty.

Abstract:
We consider a linear regression problem in a high dimensional setting where the number of covariates $p$ can be much larger than the sample size $n$. In such a situation, one often assumes sparsity of the regression vector, \textit i.e., the regression vector contains many zero components. We propose a Lasso-type estimator $\hat{\beta}^{Quad}$ (where '$Quad$' stands for quadratic) which is based on two penalty terms. The first one is the $\ell_1$ norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator, whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net $\hat{\beta}^{EN}$, which deals with sparse problems where correlations between variables may exist; and the Smooth-Lasso $\hat{\beta}^{SL}$, which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive variables). From a theoretical point of view, we establish variable selection consistency results and show that $\hat{\beta}^{Quad}$ achieves a Sparsity Inequality, \textit i.e., a bound in terms of the number of non-zero components of the 'true' regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that the S-Lasso $\hat{\beta}^{SL}$ performs better than known methods as the Lasso, the Elastic-Net $\hat{\beta}^{EN}$, and the Fused-Lasso with respect to the estimation accuracy. This is especially the case when the regression vector is 'smooth', \textit i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters and the one based on 10 fold cross validation imply two S-Lasso solutions with close performance.

Abstract:
Popular sparse estimation methods based on $\ell_1$-relaxation, such as the Lasso and the Dantzig selector, require the knowledge of the variance of the noise in order to properly tune the regularization parameter. This constitutes a major obstacle in applying these methods in several frameworks---such as time series, random fields, inverse problems---for which the noise is rarely homoscedastic and its level is hard to know in advance. In this paper, we propose a new approach to the joint estimation of the conditional mean and the conditional variance in a high-dimensional (auto-) regression setting. An attractive feature of the proposed estimator is that it is efficiently computable even for very large scale problems by solving a second-order cone program (SOCP). We present theoretical analysis and numerical results assessing the performance of the proposed procedure.