Numerical Optimization

Numerical Optimization

Convergence Rates

Wikipedia
The rate of convergence is ${\displaystyle \lim _{k\rightarrow \infty }{\frac {|x_{k+1}-x^{*}|}{|x_{k}-x^{*}|^{q}}}=L}$
Iterative methods have the following convergence rates:

• If Q=1 and L=1 we have sublinear convergence.
• If Q=1 and ${\displaystyle L\in (0,1)}$ we have linear convergence.
• If Q=1 and ${\displaystyle L=0}$ we have superlinear convergence.
• If Q=2 and ${\displaystyle L\leq \infty }$ we have quadratic convergence.

Line Search Methods

Basic idea:

• For each iteration
• Find a direction ${\displaystyle p}$.
• Then find a step length ${\displaystyle \alpha }$ which decreases ${\displaystyle f}$.
• Take a step ${\displaystyle \alpha p}$.

Trust Region Methods

Basic idea:

• For each iteration
• Assume a quadratic model of your objective function near a point.
• Find a region where you trust your model accurately represents your objective function.
• Take a step.

Variables:

• ${\displaystyle f}$ is your objective function.
• ${\displaystyle m_{k}}$ is your quadratic model at iteration k.
• ${\displaystyle x_{k}}$ is your point at iteration k.

Your model is ${\displaystyle m_{k}(p)=f_{k}+g_{k}^{T}p+{\frac {1}{2}}p^{T}B_{k}p}$ where ${\displaystyle g_{k}=\nabla f(x_{k})}$ and ${\displaystyle B_{k}}$ is a symmetric matrix.
At each iteration, you solve a constrained optimization subproblem to find the best step ${\displaystyle p}$.
${\displaystyle \min _{p\in \mathbb {R} ^{n}}m_{k}(p)}$ such that ${\displaystyle \Vert p\Vert <\Delta _{k}}$.

Cauchy Point Algorithms

The Cauchy point ${\displaystyle p_{k}^{c}=\tau _{k}p_{k}^{s}}$
where ${\displaystyle p_{k}^{s}}$ minimizes the linear model in the trust region
${\displaystyle p_{k}^{s}=\operatorname {argmin} _{p\in \mathbb {R} ^{n}}f_{k}+g_{k}^{T}p}$ s.t. ${\displaystyle \Vert p\Vert \leq \Delta _{k}}$
and ${\displaystyle \tau _{k}}$ minimizes our quadratic model along the line ${\displaystyle p_{k}^{s}}$:
${\displaystyle \tau _{k}=\operatorname {argmin} _{\tau \geq 0}m_{k}(\tau p_{k}^{s})}$ s.t. ${\displaystyle \Vert \tau p_{k}^{s}\Vert \leq \Delta _{k}}$
This can be written explicitly as ${\displaystyle p_{k}^{c}=-\tau _{k}{\frac {\Delta _{k}}{\Vert g_{K}\Vert }}g_{k}}$ where ${\displaystyle \tau _{k}={\begin{cases}1&{\text{if }}g_{k}^{T}B_{k}g_{k}\leq 0;\\\min(\Vert g_{k}\Vert ^{3}/(\Delta _{k}g_{k}^{T}B_{k}g_{k}),1)&{\text{otherwise}}\end{cases}}}$

The goal is to solve ${\displaystyle Ax=b}$ or equivalently ${\displaystyle \min \phi (x)}$ where ${\displaystyle \phi (x)=(1/2)x^{T}Ax-b^{T}x}$.
The practical CG algorithm will converge in at most ${\displaystyle n}$ steps.

Definitions

Vectors ${\displaystyle \{p_{i}\}}$ are conjugate w.r.t SPD matrix ${\displaystyle A}$ if ${\displaystyle p_{i}^{T}Ap_{j}=0}$ for ${\displaystyle i\neq j}$

Notes
• ${\displaystyle \{p_{i}\}}$ are linearly independent

Algorithms

Basic idea:

• Find a conjugate direction ${\displaystyle p_{k}}$
• Take the step size ${\displaystyle \alpha _{k}}$ which minimizes along ${\displaystyle \phi (x)}$

Practical CG method:

Code

Below is code for the practical CG method.

x = x0;
r = A*x - b;
p = -r;
k = 1;
while(norm(r) > tole && k < size(x,2))
Ap = A*p;
alpha = (r'*r)/(p'*Ap);
x = x + alpha * p;
rn = r + alpha * Ap;
beta = (rn'*rn)/(r'*r);
p = -rn + beta * p;
r = rn;
k = k+1;
end


Theorems

Given a set of conjugate directions ${\displaystyle \{p_{0},...,p_{n-1}\}}$ we can generate a sequence of ${\displaystyle x_{k}}$ with
${\displaystyle x_{k+1}=x_{k}+\alpha _{k}p_{k}}$ where ${\displaystyle \alpha _{k}=-{\frac {r_{k}^{T}p_{k}}{p_{k}^{T}Ap_{k}}}}$ minimizes ${\displaystyle \phi (x)}$

Theorem
For an ${\displaystyle x_{0}}$, the sequence ${\displaystyle x_{k}}$ converges to the solution ${\displaystyle x^{*}}$ in at most n steps.
Theorem
At step k, the residual ${\displaystyle r_{k}}$ is orthogonal to ${\displaystyle \{p_{0},...p_{k-1}\}}$ and the current iteration ${\displaystyle x_{k}}$
minimizes ${\displaystyle \phi (x)}$ over the set ${\displaystyle \{x_{0}+span(p_{0},...,p_{k-1})\}}$

Convergence Rate

The convergence rate can be estimated by:
${\displaystyle \min _{P_{k}}\max _{1\leq i\leq k}[1+\lambda _{i}P_{k}(\lambda _{i})]^{2}}$