Testmathj

- Choosing \(\displaystyle \lambda\) via cross validation tends to favor less sparse solutions and thus smaller \(\displaystyle \lambda\) then optimal choice for feature selection. See "Machine learning: a probabilistic perspective", Murphy 2012.
Classical: Least angle regression (LARS) Efron et al 2004.
Alternating Direction Method of Multipliers (ADMM). Boyd, 2011. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends in Machine Learning. Vol. 3, No. 1, 2010, pp. 1–122.
If some variables in design matrix are correlated, then LASSO is convex or not?
Tibshirani. Regression shrinkage and selection via the lasso (free). JRSS B 1996.
Convex Optimization in R by Koenker & Mizera 2014.
Pathwise coordinate optimization by Friedman et al 2007.
Statistical learning with sparsity: the Lasso and generalizations T. Hastie, R. Tibshirani, and M. Wainwright, 2015 (book)
Element of Statistical Learning (book)
https://youtu.be/A5I1G1MfUmA StatsLearning Lect8h 110913
Fu's (1998) shooting algorithm for Lasso (mentioned in the history of coordinate descent) and Zhang & Lu's (2007) modified shooting algorithm for adaptive Lasso.
Machine Learning: a Probabilistic Perspective Choosing \(\displaystyle \lambda\) via cross validation tends to favor less sparse solutions and thus smaller \(\displaystyle \lambda\) than optimal choice for feature selection.