Neural Network Compression: Difference between revisions

From David's Wiki
No edit summary
Line 3: Line 3:
==Pruning==
==Pruning==
===Sensitivity Methods===
===Sensitivity Methods===
The idea here is to measure how sensitive each neuron is.   
The idea here is to measure how sensitive each weight (i.e. connection) or neuron is.   
I.e., if you remove the neuron, how will it change the output?
I.e., if you remove the neuron, how will it change the output?
Typically, weights are pruned by zeroing them out and freezing them.
 
In general, the procedure is
# Train the network with a lot of parameters.
# Compute sensitivity for each parameter.
# Delete low-saliency parameters.
# Continue training and repeat pruning until the number of parameters is low enough or error is too high.
 
Sometimes, pruning can also increase accuracy and improve generalization.


* Mozer and Smolensky (1988)<ref name="mozer1988skeletonization"></ref> use a gate for each neuron. Then the sensitivity and be estimated with the derivative w.r.t the gate.
* Mozer and Smolensky (1988)<ref name="mozer1988skeletonization"></ref> use a gate for each neuron. Then the sensitivity and be estimated with the derivative w.r.t the gate.
* Karnin <ref name="karnin1990simple"></ref> estimates the sensitivity by monitoring the change in weight during training.
* Karnin<ref name="karnin1990simple"></ref> estimates the sensitivity by monitoring the change in weight during training.
* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref>
* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref> which uses the
 
===Redundancy Methods===
* Srinivas and Babu<ref name="srinivas2015data"></ref> propose a pair-wise similarity on each neuron: <math>s = \Vert a_j^2 \Vert_1 \Vert W_i - W_j \Vert^2_{2}</math> where <math>a_j</math> is the vector of weights on neuron j at the layer above and <math>W</math> are neuron weights. This combines a weight metric and a similarity metric into one sensitivity metric. When a neuron is pruned, the matrix for the current and next layers need to be updated.


==Quantization==
==Quantization==
Line 19: Line 31:
* [https://pytorch.org/tutorials/intermediate/pruning_tutorial.html PyTorch pruning tutorial]
* [https://pytorch.org/tutorials/intermediate/pruning_tutorial.html PyTorch pruning tutorial]
* [https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras#overview TF: Pruning with Keras]
* [https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras#overview TF: Pruning with Keras]
These support magnitude-based pruning which zero out small weights.


==Resources==
==Resources==
Line 29: Line 43:
<ref name="mozer1988skeletonization">Mozer, M. C., & Smolensky, P. (1988). Skeletonization: A technique for trimming the fat from a network via relevance assessment. (NeurIPS 1988). [https://proceedings.neurips.cc/paper/1988/file/07e1cd7dca89a1678042477183b7ac3f-Paper.pdf PDF]</ref>
<ref name="mozer1988skeletonization">Mozer, M. C., & Smolensky, P. (1988). Skeletonization: A technique for trimming the fat from a network via relevance assessment. (NeurIPS 1988). [https://proceedings.neurips.cc/paper/1988/file/07e1cd7dca89a1678042477183b7ac3f-Paper.pdf PDF]</ref>
<ref name="karnin1990simple">Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. (IEEE TNNLS 1990). [https://ieeexplore.ieee.org/document/80236 IEEE Xplore]</ref>
<ref name="karnin1990simple">Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. (IEEE TNNLS 1990). [https://ieeexplore.ieee.org/document/80236 IEEE Xplore]</ref>
<ref name="lecun1989optimal">LeCun, Y., Denker, J. S., Solla, S. A., Howard, R. E., & Jackel, L. D. (1989, November). Optimal brain damage. (NeurIPS 1989). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.7223&rep=rep1&type=pdf PDF]</ref>
<ref name="srinivas2015data">Srinivas, S., & Babu, R. V. (2015). Data-free parameter pruning for deep neural networks. [https://arxiv.org/abs/1507.06149 PDF]</ref>
}}
}}

Revision as of 21:47, 2 February 2021

Brief survey on neural network compression techniques.

Pruning

Sensitivity Methods

The idea here is to measure how sensitive each weight (i.e. connection) or neuron is.
I.e., if you remove the neuron, how will it change the output?
Typically, weights are pruned by zeroing them out and freezing them.

In general, the procedure is

  1. Train the network with a lot of parameters.
  2. Compute sensitivity for each parameter.
  3. Delete low-saliency parameters.
  4. Continue training and repeat pruning until the number of parameters is low enough or error is too high.

Sometimes, pruning can also increase accuracy and improve generalization.

  • Mozer and Smolensky (1988)[1] use a gate for each neuron. Then the sensitivity and be estimated with the derivative w.r.t the gate.
  • Karnin[2] estimates the sensitivity by monitoring the change in weight during training.
  • LeCun e al. present Optimal Brain Damage [3] which uses the

Redundancy Methods

  • Srinivas and Babu[4] propose a pair-wise similarity on each neuron: \(\displaystyle s = \Vert a_j^2 \Vert_1 \Vert W_i - W_j \Vert^2_{2}\) where \(\displaystyle a_j\) is the vector of weights on neuron j at the layer above and \(\displaystyle W\) are neuron weights. This combines a weight metric and a similarity metric into one sensitivity metric. When a neuron is pruned, the matrix for the current and next layers need to be updated.

Quantization

There are many works which use 8-bit or 16-bit representations instead of the standard 32-bit floats.

Factorization

Libraries

Both Tensorflow and PyTorch have built in libraries for pruning:

These support magnitude-based pruning which zero out small weights.

Resources

Surveys

References

<templatestyles src="Reflist/styles.css" />

  1. Mozer, M. C., & Smolensky, P. (1988). Skeletonization: A technique for trimming the fat from a network via relevance assessment. (NeurIPS 1988). PDF
  2. Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. (IEEE TNNLS 1990). IEEE Xplore
  3. LeCun, Y., Denker, J. S., Solla, S. A., Howard, R. E., & Jackel, L. D. (1989, November). Optimal brain damage. (NeurIPS 1989). PDF
  4. Srinivas, S., & Babu, R. V. (2015). Data-free parameter pruning for deep neural networks. PDF