Numerical Optimization

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Numerical Optimization


Convergence Rates

Wikipedia
The rate of convergence is \(\displaystyle \lim_{k \rightarrow \infty} \frac{|x_{k+1}-x^*|}{|x_{k}-x^*|^q}=L\)
Iterative methods have the following convergence rates:

  • If Q=1 and L=1 we have sublinear convergence.
  • If Q=1 and \(\displaystyle L\in(0,1)\) we have linear convergence.
  • If Q=1 and \(\displaystyle L=0\) we have superlinear convergence.
  • If Q=2 and \(\displaystyle L \leq \infty\) we have quadratic convergence.

Line Search Methods

Basic idea:

  • For each iteration
    • Find a direction \(\displaystyle p\).
    • Then find a step length \(\displaystyle \alpha\) which decreases \(\displaystyle f\).
    • Take a step \(\displaystyle \alpha p\).

Trust Region Methods

Basic idea:

  • For each iteration
    • Assume a quadratic model of your objective function near a point.
    • Find a region where you trust your model accurately represents your objective function.
    • Take a step.


Variables:

  • \(\displaystyle f\) is your objective function.
  • \(\displaystyle m_k\) is your quadratic model at iteration k.
  • \(\displaystyle x_k\) is your point at iteration k.

Your model is \(\displaystyle m_k(p) = f_k + g_k^T p + \frac{1}{2}p^T B_k p\) where \(\displaystyle g_k = \nabla f(x_k)\) and \(\displaystyle B_k\) is a symmetric matrix.
At each iteration, you solve a constrained optimization subproblem to find the best step \(\displaystyle p\).
\(\displaystyle \min_{p \in \mathbb{R}^n} m_k(p)\) such that \(\displaystyle \Vert p \Vert \lt \Delta_k \).

Cauchy Point Algorithms

The Cauchy point \(\displaystyle p_k^c = \tau_k p_k^s\)
where \(\displaystyle p_k^s\) minimizes the linear model in the trust region
\(\displaystyle p_k^s = \operatorname{argmin}_{p \in \mathbb{R}^n} f_k + g_k^Tp \) s.t. \(\displaystyle \Vert p \Vert \leq \Delta_k \)
and \(\displaystyle \tau_k\) minimizes our quadratic model along the line \(\displaystyle p_k^s\):
\(\displaystyle \tau_k = \operatorname{argmin}_{\tau \geq 0} m_k(\tau p_k^s)\) s.t. \(\displaystyle \Vert \tau p_k^s \Vert \leq \Delta_k \)
This can be written explicitly as \(\displaystyle p_k^c = - \tau_k \frac{\Delta_k}{\Vert g_K \Vert} g_k\) where \(\displaystyle \tau_k = \begin{cases} 1 & \text{if }g_k^T B_k g_k \leq 0;\\ \min(\Vert g_k \Vert ^3/(\Delta_k g_k^T B_k g_k), 1) & \text{otherwise} \end{cases} \)

Conjugate Gradient Methods

The goal is to solve \(\displaystyle Ax=b\) or equivalently \(\displaystyle \min \phi(x)\) where \(\displaystyle \phi(x)=(1/2)x^T A x - b^Tx\)
. The practical CG algorithm will converge in at most \(\displaystyle n\) steps.

Definitions

Vectors \(\displaystyle \{p_i\}\) are conjugate w.r.t SPD matrix \(\displaystyle A\) if \(\displaystyle p_i^T A p_j = 0\) for \(\displaystyle i \neq j\)

Notes
  • \(\displaystyle \{p_i\}\) are linearly independent

Algorithms

Basic idea:

  • Find a conjugate direction \(\displaystyle p_k\)
  • Take the step size \(\displaystyle \alpha_k\) which minimizes along \(\displaystyle \phi(x)\)

Resources