# Numerical Optimization

$$\newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits}$$

Numerical Optimization

## Convergence Rates

Wikipedia
The rate of convergence is $$\displaystyle \lim_{k \rightarrow \infty} \frac{|x_{k+1}-x^*|}{|x_{k}-x^*|^q}=L$$
Iterative methods have the following convergence rates:

• If Q=1 and L=1 we have sublinear convergence.
• If Q=1 and $$\displaystyle L\in(0,1)$$ we have linear convergence.
• If Q=1 and $$\displaystyle L=0$$ we have superlinear convergence.
• If Q=2 and $$\displaystyle L \leq \infty$$ we have quadratic convergence.

## Line Search Methods

Basic idea:

• For each iteration
• Find a direction $$\displaystyle p$$.
• Then find a step length $$\displaystyle \alpha$$ which decreases $$\displaystyle f$$.
• Take a step $$\displaystyle \alpha p$$.

## Trust Region Methods

Basic idea:

• For each iteration
• Assume a quadratic model of your objective function near a point.
• Find a region where you trust your model accurately represents your objective function.
• Take a step.

Variables:

• $$\displaystyle f$$ is your objective function.
• $$\displaystyle m_k$$ is your quadratic model at iteration k.
• $$\displaystyle x_k$$ is your point at iteration k.

Your model is $$\displaystyle m_k(p) = f_k + g_k^T p + \frac{1}{2}p^T B_k p$$ where $$\displaystyle g_k = \nabla f(x_k)$$ and $$\displaystyle B_k$$ is a symmetric matrix.
At each iteration, you solve a constrained optimization subproblem to find the best step $$\displaystyle p$$.
$$\displaystyle \min_{p \in \mathbb{R}^n} m_k(p)$$ such that $$\displaystyle \Vert p \Vert \lt \Delta_k$$.

### Cauchy Point Algorithms

The Cauchy point $$\displaystyle p_k^c = \tau_k p_k^s$$
where $$\displaystyle p_k^s$$ minimizes the linear model in the trust region
$$\displaystyle p_k^s = \operatorname{argmin}_{p \in \mathbb{R}^n} f_k + g_k^Tp$$ s.t. $$\displaystyle \Vert p \Vert \leq \Delta_k$$
and $$\displaystyle \tau_k$$ minimizes our quadratic model along the line $$\displaystyle p_k^s$$:
$$\displaystyle \tau_k = \operatorname{argmin}_{\tau \geq 0} m_k(\tau p_k^s)$$ s.t. $$\displaystyle \Vert \tau p_k^s \Vert \leq \Delta_k$$
This can be written explicitly as $$\displaystyle p_k^c = - \tau_k \frac{\Delta_k}{\Vert g_K \Vert} g_k$$ where $$\displaystyle \tau_k = \begin{cases} 1 & \text{if }g_k^T B_k g_k \leq 0;\\ \min(\Vert g_k \Vert ^3/(\Delta_k g_k^T B_k g_k), 1) & \text{otherwise} \end{cases}$$

The goal is to solve $$\displaystyle Ax=b$$ or equivalently $$\displaystyle \min \phi(x)$$ where $$\displaystyle \phi(x)=(1/2)x^T A x - b^Tx$$.
The practical CG algorithm will converge in at most $$\displaystyle n$$ steps.

### Definitions

Vectors $$\displaystyle \{p_i\}$$ are conjugate w.r.t SPD matrix $$\displaystyle A$$ if $$\displaystyle p_i^T A p_j = 0$$ for $$\displaystyle i \neq j$$

Notes
• $$\displaystyle \{p_i\}$$ are linearly independent

### Algorithms

Basic idea:

• Find a conjugate direction $$\displaystyle p_k$$
• Take the step size $$\displaystyle \alpha_k$$ which minimizes along $$\displaystyle \phi(x)$$

Practical CG method:

Code

Below is code for the practical CG method.

x = x0;
r = A*x - b;
p = -r;
k = 1;
while(norm(r) > tole && k < size(x,2))
Ap = A*p;
alpha = (r'*r)/(p'*Ap);
x = x + alpha * p;
rn = r + alpha * Ap;
beta = (rn'*rn)/(r'*r);
p = -rn + beta * p;
r = rn;
k = k+1;
end


### Theorems

Given a set of conjugate directions $$\displaystyle \{p_0,...,p_{n-1}\}$$ we can generate a sequence of $$\displaystyle x_k$$ with
$$\displaystyle x_{k+1}=x_k + \alpha_k p_k$$ where $$\displaystyle \alpha_k = -\frac{r_k^T p_k}{p_k^T A p_k}$$ minimizes $$\displaystyle \phi(x)$$

Theorem
For an $$\displaystyle x_0$$, the sequence $$\displaystyle x_k$$ converges to the solution $$\displaystyle x^*$$ in at most n steps.
Theorem
At step k, the residual $$\displaystyle r_k$$ is orthogonal to $$\displaystyle \{p_0,...p_{k-1}\}$$ and the current iteration $$\displaystyle x_{k}$$
minimizes $$\displaystyle \phi(x)$$ over the set $$\displaystyle \{x_0 + span(p_0,...,p_{k-1})\}$$

### Convergence Rate

The convergence rate can be estimated by:
$$\displaystyle \min_{P_k} \max_{1 \leq i \leq k} [1 + \lambda_i P_k(\lambda_i)]^2$$