Numerical Optimization: Difference between revisions
(12 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
==Convergence Rates== | |||
[https://en.wikipedia.org/wiki/Rate_of_convergence Wikipedia]<br> | |||
The rate of convergence is <math>\lim_{k \rightarrow \infty} \frac{|x_{k+1}-x^*|}{|x_{k}-x^*|^q}=L</math><br> | |||
Iterative methods have the following convergence rates: | |||
* If Q=1 and L=1 we have sublinear convergence. | |||
* If Q=1 and <math>L\in(0,1)</math> we have linear convergence. | |||
* If Q=1 and <math>L=0</math> we have superlinear convergence. | |||
* If Q=2 and <math>L \leq \infty</math> we have quadratic convergence. | |||
==Line Search Methods== | ==Line Search Methods== | ||
Line 30: | Line 38: | ||
The Cauchy point <math>p_k^c = \tau_k p_k^s</math><br> | The Cauchy point <math>p_k^c = \tau_k p_k^s</math><br> | ||
where <math>p_k^s</math> minimizes the linear model in the trust region<br> | where <math>p_k^s</math> minimizes the linear model in the trust region<br> | ||
<math> p_k^s = | <math> p_k^s = \operatorname{argmin}_{p \in \mathbb{R}^n} f_k + g_k^Tp </math> s.t. <math>\Vert p \Vert \leq \Delta_k </math><br> | ||
and <math>\tau_k</math> minimizes our quadratic model along the line <math>p_k^s</math>:<br> | and <math>\tau_k</math> minimizes our quadratic model along the line <math>p_k^s</math>:<br> | ||
<math>\tau_k = | <math>\tau_k = \operatorname{argmin}_{\tau \geq 0} m_k(\tau p_k^s)</math> s.t. <math>\Vert \tau p_k^s \Vert \leq \Delta_k </math><br> | ||
This can be written explicitly as <math>p_k^c = - \tau_k \frac{\Delta_k}{\Vert g_K \Vert} g_k</math> where <math>\tau_k = | This can be written explicitly as <math>p_k^c = - \tau_k \frac{\Delta_k}{\Vert g_K \Vert} g_k</math> where <math>\tau_k = | ||
\begin{cases} | \begin{cases} | ||
1 & \text{if }g_k^T | 1 & \text{if }g_k^T B_k g_k \leq 0;\\ | ||
\min(\Vert g_k \Vert ^3/(\Delta_k g_k^T B_k g_k), 1) & \text{otherwise} | \min(\Vert g_k \Vert ^3/(\Delta_k g_k^T B_k g_k), 1) & \text{otherwise} | ||
\end{cases} | \end{cases} | ||
</math> | </math> | ||
==Conjugate Gradient Methods== | |||
[https://www.cs.cmu.edu/~15859n/RelatedWork/painless-conjugate-gradient.pdf Painless Conjugate Gradient]<br> | |||
The goal is to solve <math>Ax=b</math> or equivalently <math>\min \phi(x)</math> where <math>\phi(x)=(1/2)x^T A x - b^Tx</math>.<br> | |||
The practical CG algorithm will converge in at most <math>n</math> steps. | |||
===Definitions=== | |||
Vectors <math>\{p_i\}</math> are conjugate w.r.t SPD matrix <math>A</math> if <math>p_i^T A p_j = 0</math> for <math>i \neq j</math> | |||
;Notes | |||
* <math>\{p_i\}</math> are linearly independent | |||
===Algorithms=== | |||
Basic idea:<br> | |||
* Find a conjugate direction <math>p_k</math> | |||
* Take the step size <math>\alpha_k</math> which minimizes along <math>\phi(x)</math> | |||
Practical CG method:<br> | |||
{{ hidden | Code | | |||
Below is code for the practical CG method.<br> | |||
<pre> | |||
x = x0; | |||
r = A*x - b; | |||
p = -r; | |||
k = 1; | |||
while(norm(r) > tole && k < size(x,2)) | |||
Ap = A*p; | |||
alpha = (r'*r)/(p'*Ap); | |||
x = x + alpha * p; | |||
rn = r + alpha * Ap; | |||
beta = (rn'*rn)/(r'*r); | |||
p = -rn + beta * p; | |||
r = rn; | |||
k = k+1; | |||
end | |||
</pre> | |||
}} | |||
===Theorems=== | |||
Given a set of conjugate directions <math>\{p_0,...,p_{n-1}\}</math> we can generate a sequence of <math>x_k</math> with<br> | |||
<math> x_{k+1}=x_k + \alpha_k p_k</math> where <math>\alpha_k = -\frac{r_k^T p_k}{p_k^T A p_k}</math> minimizes <math>\phi(x)</math><br> | |||
; Theorem: For an <math>x_0</math>, the sequence <math>x_k</math> converges to the solution <math>x^*</math> in at most n steps. | |||
; Theorem: At step k, the residual <math>r_k</math> is orthogonal to <math>\{p_0,...p_{k-1}\}</math> and the current iteration <math>x_{k}</math> | |||
: minimizes <math>\phi(x)</math> over the set <math>\{x_0 + span(p_0,...,p_{k-1})\}</math> | |||
===Convergence Rate=== | |||
The convergence rate can be estimated by:<br> | |||
<math>\min_{P_k} \max_{1 \leq i \leq k} [1 + \lambda_i P_k(\lambda_i)]^2</math> | |||
==Resources== | ==Resources== | ||
* [https://link.springer.com/book/10.1007%2F978-0-387-40065-5 Numerical Optimization by Nocedal and Wright (2006)] | * [https://link.springer.com/book/10.1007%2F978-0-387-40065-5 Numerical Optimization by Nocedal and Wright (2006)] |