# Numerical Optimization

Numerical Optimization

## Convergence Rates

Wikipedia
The rate of convergence is $\lim _{k\rightarrow \infty }{\frac {|x_{k+1}-x^{*}|}{|x_{k}-x^{*}|^{q}}}=L$ Iterative methods have the following convergence rates:

• If Q=1 and L=1 we have sublinear convergence.
• If Q=1 and $L\in (0,1)$ we have linear convergence.
• If Q=1 and $L=0$ we have superlinear convergence.
• If Q=2 and $L\leq \infty$ we have quadratic convergence.

## Line Search Methods

Basic idea:

• For each iteration
• Find a direction $p$ .
• Then find a step length $\alpha$ which decreases $f$ .
• Take a step $\alpha p$ .

## Trust Region Methods

Basic idea:

• For each iteration
• Assume a quadratic model of your objective function near a point.
• Find a region where you trust your model accurately represents your objective function.
• Take a step.

Variables:

• $f$ is your objective function.
• $m_{k}$ is your quadratic model at iteration k.
• $x_{k}$ is your point at iteration k.

Your model is $m_{k}(p)=f_{k}+g_{k}^{T}p+{\frac {1}{2}}p^{T}B_{k}p$ where $g_{k}=\nabla f(x_{k})$ and $B_{k}$ is a symmetric matrix.
At each iteration, you solve a constrained optimization subproblem to find the best step $p$ .
$\min _{p\in \mathbb {R} ^{n}}m_{k}(p)$ such that $\Vert p\Vert <\Delta _{k}$ .

### Cauchy Point Algorithms

The Cauchy point $p_{k}^{c}=\tau _{k}p_{k}^{s}$ where $p_{k}^{s}$ minimizes the linear model in the trust region
$p_{k}^{s}=\operatorname {argmin} _{p\in \mathbb {R} ^{n}}f_{k}+g_{k}^{T}p$ s.t. $\Vert p\Vert \leq \Delta _{k}$ and $\tau _{k}$ minimizes our quadratic model along the line $p_{k}^{s}$ :
$\tau _{k}=\operatorname {argmin} _{\tau \geq 0}m_{k}(\tau p_{k}^{s})$ s.t. $\Vert \tau p_{k}^{s}\Vert \leq \Delta _{k}$ This can be written explicitly as $p_{k}^{c}=-\tau _{k}{\frac {\Delta _{k}}{\Vert g_{K}\Vert }}g_{k}$ where $\tau _{k}={\begin{cases}1&{\text{if }}g_{k}^{T}B_{k}g_{k}\leq 0;\\\min(\Vert g_{k}\Vert ^{3}/(\Delta _{k}g_{k}^{T}B_{k}g_{k}),1)&{\text{otherwise}}\end{cases}}$ The goal is to solve $Ax=b$ or equivalently $\min \phi (x)$ where $\phi (x)=(1/2)x^{T}Ax-b^{T}x$ .
The practical CG algorithm will converge in at most $n$ steps.

### Definitions

Vectors $\{p_{i}\}$ are conjugate w.r.t SPD matrix $A$ if $p_{i}^{T}Ap_{j}=0$ for $i\neq j$ Notes
• $\{p_{i}\}$ are linearly independent

### Algorithms

Basic idea:

• Find a conjugate direction $p_{k}$ • Take the step size $\alpha _{k}$ which minimizes along $\phi (x)$ Practical CG method:

Code

Below is code for the practical CG method.

x = x0;
r = A*x - b;
p = -r;
k = 1;
while(norm(r) > tole && k < size(x,2))
Ap = A*p;
alpha = (r'*r)/(p'*Ap);
x = x + alpha * p;
rn = r + alpha * Ap;
beta = (rn'*rn)/(r'*r);
p = -rn + beta * p;
r = rn;
k = k+1;
end


### Theorems

Given a set of conjugate directions $\{p_{0},...,p_{n-1}\}$ we can generate a sequence of $x_{k}$ with
$x_{k+1}=x_{k}+\alpha _{k}p_{k}$ where $\alpha _{k}=-{\frac {r_{k}^{T}p_{k}}{p_{k}^{T}Ap_{k}}}$ minimizes $\phi (x)$ Theorem
For an $x_{0}$ , the sequence $x_{k}$ converges to the solution $x^{*}$ in at most n steps.
Theorem
At step k, the residual $r_{k}$ is orthogonal to $\{p_{0},...p_{k-1}\}$ and the current iteration $x_{k}$ minimizes $\phi (x)$ over the set $\{x_{0}+span(p_{0},...,p_{k-1})\}$ ### Convergence Rate

The convergence rate can be estimated by:
$\min _{P_{k}}\max _{1\leq i\leq k}[1+\lambda _{i}P_{k}(\lambda _{i})]^{2}$ 