Neural Network Compression: Difference between revisions

Neural Network Compression (view source)

Revision as of 21:47, 2 February 2021

1,413 bytes added , 2 February 2021

no edit summary

David

Bureaucrats, Interface administrators, Administrators

5,351

edits

@@ Line 3: / Line 3: @@
 ==Pruning==
 ===Sensitivity Methods===
-The idea here is to measure how sensitive each neuron is.
+The idea here is to measure how sensitive each weight (i.e. connection) or neuron is.
 I.e., if you remove the neuron, how will it change the output?
+Typically, weights are pruned by zeroing them out and freezing them.
+In general, the procedure is
+# Train the network with a lot of parameters.
+# Compute sensitivity for each parameter.
+# Delete low-saliency parameters.
+# Continue training and repeat pruning until the number of parameters is low enough or error is too high.
+Sometimes, pruning can also increase accuracy and improve generalization.
 * Mozer and Smolensky (1988)<ref name="mozer1988skeletonization"></ref> use a gate for each neuron. Then the sensitivity and be estimated with the derivative w.r.t the gate.
-* Karnin <ref name="karnin1990simple"></ref> estimates the sensitivity by monitoring the change in weight during training.
+* Karnin<ref name="karnin1990simple"></ref> estimates the sensitivity by monitoring the change in weight during training.
-* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref>
+* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref> which uses the
+===Redundancy Methods===
+* Srinivas and Babu<ref name="srinivas2015data"></ref> propose a pair-wise similarity on each neuron: <math>s = \Vert a_j^2 \Vert_1 \Vert W_i - W_j \Vert^2_{2}</math> where <math>a_j</math> is the vector of weights on neuron j at the layer above and <math>W</math> are neuron weights. This combines a weight metric and a similarity metric into one sensitivity metric. When a neuron is pruned, the matrix for the current and next layers need to be updated.
 ==Quantization==
@@ Line 19: / Line 31: @@
 * [https://pytorch.org/tutorials/intermediate/pruning_tutorial.html PyTorch pruning tutorial]
 * [https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras#overview TF: Pruning with Keras]
+These support magnitude-based pruning which zero out small weights.
 ==Resources==
@@ Line 29: / Line 43: @@
 <ref name="mozer1988skeletonization">Mozer, M. C., & Smolensky, P. (1988). Skeletonization: A technique for trimming the fat from a network via relevance assessment. (NeurIPS 1988). [https://proceedings.neurips.cc/paper/1988/file/07e1cd7dca89a1678042477183b7ac3f-Paper.pdf PDF]</ref>
 <ref name="karnin1990simple">Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. (IEEE TNNLS 1990). [https://ieeexplore.ieee.org/document/80236 IEEE Xplore]</ref>
+<ref name="lecun1989optimal">LeCun, Y., Denker, J. S., Solla, S. A., Howard, R. E., & Jackel, L. D. (1989, November). Optimal brain damage. (NeurIPS 1989). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.7223&rep=rep1&type=pdf PDF]</ref>
+<ref name="srinivas2015data">Srinivas, S., & Babu, R. V. (2015). Data-free parameter pruning for deep neural networks. [https://arxiv.org/abs/1507.06149 PDF]</ref>
 }}