Deep Learning: Difference between revisions

Line 1,337: Line 1,337:
\end{aligned}
\end{aligned}
</math>
</math>
}}
===Regularized GDA===
<math>\theta_{t+1} = \theta_t + \eta R(\theta_t) G(\theta_t)</math>
* <math>R(\theta)</math> is chosen to not change the fixed point of the original problewm.
* Suppose <math>R(\theta)</math> is invertible. <math>\theta^*</math> is a fixed point iff <math>g(\theta^*) = 0</math>.
Proof: 
(->) If <math>\theta^*</math. is a fixed point the <math>0 = \eta R(\theta^*) g(\theta^*) \implies g(\theta^*) = 0</math>. 
(<-) If <math>g(\theta^*)=0</math>, we want to show <math>\theta^*</math> is a fixed point. 
If <math>g(\theta^*)=0</math> then <math>\tilde{J}(\theta^*) = I + \eta R(\theta^*) H(\theta^*) + 0</math>. 
<math>R(\theta) = I - \gamma H^T</math> 
<math>R(\theta) H(\theta) = (I-\gamma H^T)H = H - \gamma H^T H</math> 
<math>J = I - \eta H - \eta \gamma H^T H</math> 
Thus this regularization pushes the real parts of the eigenvalues to be more negative so that the training will be more stable.
This reduces <math>Im(\lambda)/Re(\lambda)</math> but may not necessarily help because it increases <math>Re(\lambda)</math>.
===Approximate Local minmax===
Under continuous and differentiable assumptions, the space of fixed-points is guaranteed to be non-empty (by Brouwev's Fixed Point thm) however the set of local min-max can be empty. 
[Dskalakis & Penges 2018, Adolphs et al 2018] 
In an unconstrained case, the strong local min-max set is a subset of stabled fixed points of GDA which is a subset of stable points of OGDA. 
;Definition
<math>(x^*, y^*)</math> is an <math>(\epsilon, \delta)</math> approximate local min-max if:
<math>f(x^*, y) - \epsilon \leq f(x^*, y^*) \leq f(x, y^*) + \epsilon</math> 
<math>\forall (x,y) \in B_{\delta}(x^*, y^*)</math>   
Suppose f is a G-lipschitz and L-smooth. 
For <math>\delta < \epsilon/G</math>, every point is a local min-max. 
For <math>\delta > \sqrt{2\epsilon/L}</math>, the local min-max may not exist and np-hard to compute. 
For the region in between, the local min max is guaranteed to exist. The compute is PPAD (super polynomial) and not np-hard.
;So what?
For ERM non-convex minimization, we us GD to solve. Over-parameterization helped us in terms of global convergence. 
Would over-parameterization help in solving non-convex concave min-max? 
Yes. An ICLR 2021 paper by anonymous authors ''Understanding the role of over-parameterization in GANs''.
;Theorem
Consider a WGAN problem. 
Suppose the generator is one-hidden layer and the discriminator is linear. 
<math>\min_{G} \max_{D} f(G,D)</math> is non-convex concave optimization. 
If the generator is sufficiently over-parameterized then Sim GDA converges to a global min-max solution.


==Misc==
==Misc==