Neural Network Compression: Difference between revisions

Neural Network Compression (view source)

Revision as of 22:09, 3 February 2021

1,897 bytes added , 3 February 2021

no edit summary

David

Bureaucrats, Interface administrators, Administrators

5,351

edits

@@ Line 21: / Line 21: @@
 ===Redundancy Methods===
 * Srinivas and Babu<ref name="srinivas2015data"></ref> propose a pair-wise similarity on each neuron: <math>s = \Vert a_j^2 \Vert_1 \Vert W_i - W_j \Vert^2_{2}</math> where <math>a_j</math> is the vector of weights on neuron j at the layer above and <math>W</math> are neuron weights. This combines a weight metric and a similarity metric into one sensitivity metric. When a neuron is pruned, the matrix for the current and next layers need to be updated.
+===Structured Pruning===
+Structured pruning focuses on keeping the dense structure of the network such that the pruned weights can benefit using standard dense matrix multiplication operations.
+* Wen ''et al.'' (2016) <ref name="wen2016learning"></ref> propose Structured Sparsity Learning (SSL) on CNNs. Given filters of size (N, C, M, K), i.e. (out-channels, in-channels, height, width), they use a group lasso loss/regularization to penalize usage of extra input and output channels. They also learn filter shapes using this regularization.
 ==Quantization==
-There are many works which use 8-bit or 16-bit representations instead of the standard 32-bit floats.
+There are many codebases which use 8-bit or 16-bit representations instead of the standard 32-bit floats.
+Work on quantization typically focus on different representations and mixed-precision training, though quantization can also be used to speed up inference.
+* Google uses [https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus bfloat16] for training on TPUs.
 ==Factorization==
+Also known as tensor decomposition.
+* Denil ''et al.'' (2013)<ref name="denil2013predicting"></ref> propose a low-rank factorization: <math>W=UV</math> where <math>U</math> is <math>n_v \times n_\alpha</math> and <math>V</math> is <math>n_\alpha \times n_h</math>. Here, vectors are left-multiplied against <math>W</math>. The compare several scenarios: training both U and V, randomly setting U with identity basis vectors, randomly setting U with iid Gaussian entries, and more.
 ==Libraries==
@@ Line 38: / Line 50: @@
 * [https://axon.cs.byu.edu/~martinez/classes/678/Papers/Reed_PruningSurvey.pdf Pruning algorithms a survey] (1993) by Russel Reed
 * [https://arxiv.org/pdf/1710.09282.pdf A Survey of Model Compression and Acceleration for Deep Neural Networks] (2017) by Cheng et al.
+* [https://arxiv.org/abs/2006.03669 An Overview of Neural Network Compression] (2020) by James O' Neill
 ==References==
@@ Line 45: / Line 58: @@
 <ref name="lecun1989optimal">LeCun, Y., Denker, J. S., Solla, S. A., Howard, R. E., & Jackel, L. D. (1989, November). Optimal brain damage. (NeurIPS 1989). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.7223&rep=rep1&type=pdf PDF]</ref>
 <ref name="srinivas2015data">Srinivas, S., & Babu, R. V. (2015). Data-free parameter pruning for deep neural networks. [https://arxiv.org/abs/1507.06149 PDF]</ref>
+<ref name="denil2013predicting">Denil, M., Shakibi, B., Dinh, L., Ranzato, M. A., & De Freitas, N. (2013). Predicting parameters in deep learning. arXiv preprint arXiv:1306.0543. [https://arxiv.org/abs/1306.0543 Arxiv]</ref>
+<ref name="wen2016learning">Wen, W., Wu, C., Wang, Y., Chen, Y., & Li, H. (2016). Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665. [https://arxiv.org/abs/1608.03665 Arxiv]</ref>
 }}