Testmathj: Difference between revisions

From David's Wiki
No edit summary
No edit summary
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
** Choosing <math>\lambda</math> via cross validation tends to favor less sparse solutions and thus smaller <math>\lambda</math> then optimal choice for feature selection. See "Machine learning: a probabilistic perspective", Murphy 2012.
 
* Classical: Least angle regression (LARS) Efron et al 2004.
<math>\| \frac{c}{d} \|</math>
* [https://www.mathworks.com/help/stats/lasso.html?s_tid=gn_loc_drop Alternating Direction Method of Multipliers (ADMM)]. Boyd, 2011. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends in Machine Learning. Vol. 3, No. 1, 2010, pp. 1–122.
** https://stanford.edu/~boyd/papers/pdf/admm_slides.pdf
** [https://cran.r-project.org/web/packages/ADMM/ ADMM] package
** [https://www.quora.com/Convex-Optimization-Whats-the-advantage-of-alternating-direction-method-of-multipliers-ADMM-and-whats-the-use-case-for-this-type-of-method-compared-against-classic-gradient-descent-or-conjugate-gradient-descent-method What's the advantage of alternating direction method of multipliers (ADMM), and what's the use case for this type of method compared against classic gradient descent or conjugate gradient descent method?]
* [https://math.stackexchange.com/questions/771585/convexity-of-lasso If some variables in design matrix are correlated, then LASSO is convex or not?]
* Tibshirani. [http://www.jstor.org/stable/2346178 Regression shrinkage and selection via the lasso] (free). JRSS B 1996.
* [http://www.econ.uiuc.edu/~roger/research/conopt/coptr.pdf Convex Optimization in R] by Koenker & Mizera 2014.
* [https://web.stanford.edu/~hastie/Papers/pathwise.pdf Pathwise coordinate optimization] by Friedman et al 2007.
* [http://web.stanford.edu/~hastie/StatLearnSparsity/ Statistical learning with sparsity: the Lasso and generalizations] T. Hastie, R. Tibshirani, and M. Wainwright, 2015 (book)
* Element of Statistical Learning (book)
* https://youtu.be/A5I1G1MfUmA StatsLearning Lect8h 110913
* Fu's (1998) shooting algorithm for Lasso ([http://web.stanford.edu/~hastie/TALKS/CD.pdf#page=11 mentioned] in the history of coordinate descent) and Zhang & Lu's (2007) modified shooting algorithm for adaptive Lasso.
* [https://www.cs.ubc.ca/~murphyk/MLbook/ Machine Learning: a Probabilistic Perspective] Choosing <math>\lambda</math> via cross validation tends to favor less sparse solutions and thus smaller <math>\lambda</math> than optimal choice for feature selection.

Latest revision as of 23:07, 7 September 2019

\(\displaystyle \| \frac{c}{d} \|\)