Jump to content

Neural Network Compression: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 21: Line 21:
===Redundancy Methods===
===Redundancy Methods===
* Srinivas and Babu<ref name="srinivas2015data"></ref> propose a pair-wise similarity on each neuron: <math>s = \Vert a_j^2 \Vert_1 \Vert W_i - W_j \Vert^2_{2}</math> where <math>a_j</math> is the vector of weights on neuron j at the layer above and <math>W</math> are neuron weights. This combines a weight metric and a similarity metric into one sensitivity metric. When a neuron is pruned, the matrix for the current and next layers need to be updated.
* Srinivas and Babu<ref name="srinivas2015data"></ref> propose a pair-wise similarity on each neuron: <math>s = \Vert a_j^2 \Vert_1 \Vert W_i - W_j \Vert^2_{2}</math> where <math>a_j</math> is the vector of weights on neuron j at the layer above and <math>W</math> are neuron weights. This combines a weight metric and a similarity metric into one sensitivity metric. When a neuron is pruned, the matrix for the current and next layers need to be updated.
===Structured Pruning===
Structured pruning focuses on keeping the dense structure of the network such that the pruned weights can benefit using standard dense matrix multiplication operations.
* Wen ''et al.'' (2016) <ref name="wen2016learning"></ref> propose Structured Sparsity Learning (SSL) on CNNs. Given filters of size (N, C, M, K), i.e. (out-channels, in-channels, height, width), they use a group lasso loss/regularization to penalize usage of extra input and output channels. They also learn filter shapes using this regularization.


==Quantization==
==Quantization==
There are many works which use 8-bit or 16-bit representations instead of the standard 32-bit floats.
There are many codebases which use 8-bit or 16-bit representations instead of the standard 32-bit floats.
Work on quantization typically focus on different representations and mixed-precision training, though quantization can also be used to speed up inference.
 
* Google uses [https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus bfloat16] for training on TPUs.
 


==Factorization==
==Factorization==
Also known as tensor decomposition.
* Denil ''et al.'' (2013)<ref name="denil2013predicting"></ref> propose a low-rank factorization: <math>W=UV</math> where <math>U</math> is <math>n_v \times n_\alpha</math> and <math>V</math> is <math>n_\alpha \times n_h</math>. Here, vectors are left-multiplied against <math>W</math>. The compare several scenarios: training both U and V, randomly setting U with identity basis vectors, randomly setting U with iid Gaussian entries, and more.


==Libraries==
==Libraries==
Line 38: Line 50:
* [https://axon.cs.byu.edu/~martinez/classes/678/Papers/Reed_PruningSurvey.pdf Pruning algorithms a survey] (1993) by Russel Reed
* [https://axon.cs.byu.edu/~martinez/classes/678/Papers/Reed_PruningSurvey.pdf Pruning algorithms a survey] (1993) by Russel Reed
* [https://arxiv.org/pdf/1710.09282.pdf A Survey of Model Compression and Acceleration for Deep Neural Networks] (2017) by Cheng et al.
* [https://arxiv.org/pdf/1710.09282.pdf A Survey of Model Compression and Acceleration for Deep Neural Networks] (2017) by Cheng et al.
* [https://arxiv.org/abs/2006.03669 An Overview of Neural Network Compression] (2020) by James O' Neill


==References==
==References==
Line 45: Line 58:
<ref name="lecun1989optimal">LeCun, Y., Denker, J. S., Solla, S. A., Howard, R. E., & Jackel, L. D. (1989, November). Optimal brain damage. (NeurIPS 1989). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.7223&rep=rep1&type=pdf PDF]</ref>
<ref name="lecun1989optimal">LeCun, Y., Denker, J. S., Solla, S. A., Howard, R. E., & Jackel, L. D. (1989, November). Optimal brain damage. (NeurIPS 1989). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.7223&rep=rep1&type=pdf PDF]</ref>
<ref name="srinivas2015data">Srinivas, S., & Babu, R. V. (2015). Data-free parameter pruning for deep neural networks. [https://arxiv.org/abs/1507.06149 PDF]</ref>
<ref name="srinivas2015data">Srinivas, S., & Babu, R. V. (2015). Data-free parameter pruning for deep neural networks. [https://arxiv.org/abs/1507.06149 PDF]</ref>
<ref name="denil2013predicting">Denil, M., Shakibi, B., Dinh, L., Ranzato, M. A., & De Freitas, N. (2013). Predicting parameters in deep learning. arXiv preprint arXiv:1306.0543. [https://arxiv.org/abs/1306.0543 Arxiv]</ref>
<ref name="wen2016learning">Wen, W., Wu, C., Wang, Y., Chen, Y., & Li, H. (2016). Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665. [https://arxiv.org/abs/1608.03665 Arxiv]</ref>
}}
}}