Numerical Optimization: Difference between revisions

(12 intermediate revisions by the same user not shown)

Line 2:

==Convergence Rates==

[https://en.wikipedia.org/wiki/Rate_of_convergence Wikipedia]

The rate of convergence is <math>\lim_{k \rightarrow \infty} \frac{|x_{k+1}-x^*|}{|x_{k}-x^*|^q}=L</math>

Iterative methods have the following convergence rates:

* If Q=1 and L=1 we have sublinear convergence.

* If Q=1 and <math>L\in(0,1)</math> we have linear convergence.

* If Q=1 and <math>L=0</math> we have superlinear convergence.

* If Q=2 and <math>L \leq \infty</math> we have quadratic convergence.

==Line Search Methods==

Line 30:

Line 38:

The Cauchy point <math>p_k^c = \tau_k p_k^s</math>

where <math>p_k^s</math> minimizes the linear model in the trust region

<math> p_k^s = ~~argmin_~~{p \in \mathbb{R}^n} f_k + g_k^Tp </math> s.t. <math>\Vert p \Vert \leq \Delta_k </math>

<math> p_k^s = \operatorname{argmin}_{p \in \mathbb{R}^n} f_k + g_k^Tp </math> s.t. <math>\Vert p \Vert \leq \Delta_k </math>

and <math>\tau_k</math> minimizes our quadratic model along the line <math>p_k^s</math>:

<math>\tau_k = ~~argmin_~~{\tau \geq 0} m_k(\tau p_k^s)</math> s.t. <math>\Vert \tau p_k^s \leq \Delta_k </math>

<math>\tau_k = \operatorname{argmin}_{\tau \geq 0} m_k(\tau p_k^s)</math> s.t. <math>\Vert \tau p_k^s \Vert \leq \Delta_k </math>

This can be written explicitly as <math>p_k^c = - \tau_k \frac{\Delta_k}{\Vert g_K \Vert} g_k</math> where <math>\tau_k =

\begin{cases}

1 & \text{if }g_k^T ~~B-k~~ g_k \leq 0;\\

1 & \text{if }g_k^T B_k g_k \leq 0;\\

\min(\Vert g_k \Vert ^3/(\Delta_k g_k^T B_k g_k), 1) & \text{otherwise}

\end{cases}

</math>

==Conjugate Gradient Methods==

[https://www.cs.cmu.edu/~15859n/RelatedWork/painless-conjugate-gradient.pdf Painless Conjugate Gradient]

The goal is to solve <math>Ax=b</math> or equivalently <math>\min \phi(x)</math> where <math>\phi(x)=(1/2)x^T A x - b^Tx</math>.

The practical CG algorithm will converge in at most <math>n</math> steps.

===Definitions===

Vectors <math>\{p_i\}</math> are conjugate w.r.t SPD matrix <math>A</math> if <math>p_i^T A p_j = 0</math> for <math>i \neq j</math>

;Notes

* <math>\{p_i\}</math> are linearly independent

===Algorithms===

Basic idea:

* Find a conjugate direction <math>p_k</math>

* Take the step size <math>\alpha_k</math> which minimizes along <math>\phi(x)</math>

Practical CG method:

{{ hidden | Code |

Below is code for the practical CG method.

<pre>

x = x0;

r = A*x - b;

p = -r;

k = 1;

while(norm(r) > tole && k < size(x,2))

Ap = A*p;

alpha = (r'*r)/(p'*Ap);

x = x + alpha * p;

rn = r + alpha * Ap;

beta = (rn'*rn)/(r'*r);

p = -rn + beta * p;

r = rn;

k = k+1;

end

</pre>

}}

===Theorems===

Given a set of conjugate directions <math>\{p_0,...,p_{n-1}\}</math> we can generate a sequence of <math>x_k</math> with

<math> x_{k+1}=x_k + \alpha_k p_k</math> where <math>\alpha_k = -\frac{r_k^T p_k}{p_k^T A p_k}</math> minimizes <math>\phi(x)</math>

; Theorem: For an <math>x_0</math>, the sequence <math>x_k</math> converges to the solution <math>x^*</math> in at most n steps.

; Theorem: At step k, the residual <math>r_k</math> is orthogonal to <math>\{p_0,...p_{k-1}\}</math> and the current iteration <math>x_{k}</math>

: minimizes <math>\phi(x)</math> over the set <math>\{x_0 + span(p_0,...,p_{k-1})\}</math>

===Convergence Rate===

The convergence rate can be estimated by:

<math>\min_{P_k} \max_{1 \leq i \leq k} [1 + \lambda_i P_k(\lambda_i)]^2</math>

==Resources==

* [https://link.springer.com/book/10.1007%2F978-0-387-40065-5 Numerical Optimization by Nocedal and Wright (2006)]

@@ Line 2: / Line 2: @@
+==Convergence Rates==
+[https://en.wikipedia.org/wiki/Rate_of_convergence Wikipedia]<br>
+The rate of convergence is <math>\lim_{k \rightarrow \infty} \frac{|x_{k+1}-x^*|}{|x_{k}-x^*|^q}=L</math><br>
+Iterative methods have the following convergence rates:
+* If Q=1 and L=1 we have sublinear convergence.
+* If Q=1 and <math>L\in(0,1)</math> we have linear convergence.
+* If Q=1 and <math>L=0</math> we have superlinear convergence.
+* If Q=2 and <math>L \leq \infty</math> we have quadratic convergence.
 ==Line Search Methods==
@@ Line 30: / Line 38: @@
 The Cauchy point <math>p_k^c = \tau_k p_k^s</math><br>
 where <math>p_k^s</math> minimizes the linear model in the trust region<br>
-<math> p_k^s = argmin_{p \in \mathbb{R}^n} f_k + g_k^Tp </math> s.t. <math>\Vert p \Vert \leq \Delta_k </math><br>
+<math> p_k^s = \operatorname{argmin}_{p \in \mathbb{R}^n} f_k + g_k^Tp </math> s.t. <math>\Vert p \Vert \leq \Delta_k </math><br>
 and <math>\tau_k</math> minimizes our quadratic model along the line <math>p_k^s</math>:<br>
-<math>\tau_k = argmin_{\tau \geq 0} m_k(\tau p_k^s)</math> s.t. <math>\Vert \tau p_k^s \leq \Delta_k </math><br>
+<math>\tau_k = \operatorname{argmin}_{\tau \geq 0} m_k(\tau p_k^s)</math> s.t. <math>\Vert \tau p_k^s \Vert \leq \Delta_k </math><br>
 This can be written explicitly as <math>p_k^c = - \tau_k \frac{\Delta_k}{\Vert g_K \Vert} g_k</math> where <math>\tau_k =
 \begin{cases}
-& \text{if }g_k^T B-k g_k \leq 0;\\
+& \text{if }g_k^T B_k g_k \leq 0;\\
 \min(\Vert g_k \Vert ^3/(\Delta_k g_k^T B_k  g_k), 1) & \text{otherwise}
 \end{cases}
 </math>
+==Conjugate Gradient Methods==
+[https://www.cs.cmu.edu/~15859n/RelatedWork/painless-conjugate-gradient.pdf Painless Conjugate Gradient]<br>
+The goal is to solve <math>Ax=b</math> or equivalently <math>\min \phi(x)</math> where <math>\phi(x)=(1/2)x^T A x - b^Tx</math>.<br>
+The practical CG algorithm will converge in at most <math>n</math> steps.
+===Definitions===
+Vectors <math>\{p_i\}</math> are conjugate w.r.t SPD matrix <math>A</math> if <math>p_i^T A p_j = 0</math> for <math>i \neq j</math>
+;Notes
+* <math>\{p_i\}</math> are linearly independent
+===Algorithms===
+Basic idea:<br>
+* Find a conjugate direction <math>p_k</math>
+* Take the step size <math>\alpha_k</math> which minimizes along <math>\phi(x)</math>
+Practical CG method:<br>
+{{ hidden | Code |
+Below is code for the practical CG method.<br>
+<pre>
+x = x0;
+r = A*x - b;
+p = -r;
+k = 1;
+while(norm(r) > tole && k < size(x,2))
+   Ap = A*p;
+   alpha = (r'*r)/(p'*Ap);
+   x = x + alpha * p;
+   rn = r + alpha * Ap;
+   beta = (rn'*rn)/(r'*r);
+   p = -rn + beta * p;
+   r = rn;
+   k = k+1;
+end
+</pre>
+}}
+===Theorems===
+Given a set of conjugate directions <math>\{p_0,...,p_{n-1}\}</math> we can generate a sequence of <math>x_k</math> with<br>
+<math> x_{k+1}=x_k + \alpha_k p_k</math> where <math>\alpha_k = -\frac{r_k^T p_k}{p_k^T A p_k}</math> minimizes <math>\phi(x)</math><br>
+; Theorem: For an <math>x_0</math>, the sequence <math>x_k</math> converges to the solution <math>x^*</math> in at most n steps.
+; Theorem: At step k, the residual <math>r_k</math> is orthogonal to <math>\{p_0,...p_{k-1}\}</math> and the current iteration <math>x_{k}</math>
+: minimizes <math>\phi(x)</math> over the set <math>\{x_0 + span(p_0,...,p_{k-1})\}</math>
+===Convergence Rate===
+The convergence rate can be estimated by:<br>
+<math>\min_{P_k} \max_{1 \leq i \leq k} [1 + \lambda_i P_k(\lambda_i)]^2</math>
 ==Resources==
 * [https://link.springer.com/book/10.1007%2F978-0-387-40065-5 Numerical Optimization by Nocedal and Wright (2006)]