Neural Network Compression: Difference between revisions

(One intermediate revision by the same user not shown)

Line 18:

* Mozer and Smolensky (1988)<ref name="mozer1988skeletonization"></ref> use a gate for each neuron. Then the sensitivity and be estimated with the derivative w.r.t the gate.

* Karnin<ref name="karnin1990simple"></ref> estimates the sensitivity by monitoring the change in weight during training.

* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref> which uses the

* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref> which uses the second derivative of each weight.

===Redundancy Methods===

Line 24:

===Structured Pruning===

Structured pruning focuses on keeping the dense structure of the network such that the pruned ~~weights~~ can benefit using standard dense matrix multiplication operations.

Structured pruning focuses on keeping the dense structure of the network such that the pruned network can benefit using standard dense matrix multiplication operations.<br>

This is in contrast to unstructured pruning which zeros out values in the weight matrix but may not necessarilly run faster.

* Wen ''et al.'' (2016) <ref name="wen2016learning"></ref> propose Structured Sparsity Learning (SSL) on CNNs. Given filters of size (N, C, M, K), i.e. (out-channels, in-channels, height, width), they use a group lasso loss/regularization to penalize usage of extra input and output channels. They also learn filter shapes using this regularization.

@@ Line 18: / Line 18: @@
 * Mozer and Smolensky (1988)<ref name="mozer1988skeletonization"></ref> use a gate for each neuron. Then the sensitivity and be estimated with the derivative w.r.t the gate.
 * Karnin<ref name="karnin1990simple"></ref> estimates the sensitivity by monitoring the change in weight during training.
-* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref> which uses the
+* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref> which uses the second derivative of each weight.
 ===Redundancy Methods===
@@ Line 24: / Line 24: @@
 ===Structured Pruning===
-Structured pruning focuses on keeping the dense structure of the network such that the pruned weights can benefit using standard dense matrix multiplication operations.
+Structured pruning focuses on keeping the dense structure of the network such that the pruned network can benefit using standard dense matrix multiplication operations.<br>
+This is in contrast to unstructured pruning which zeros out values in the weight matrix but may not necessarilly run faster.
 * Wen ''et al.'' (2016) <ref name="wen2016learning"></ref> propose Structured Sparsity Learning (SSL) on CNNs. Given filters of size (N, C, M, K), i.e. (out-channels, in-channels, height, width), they use a group lasso loss/regularization to penalize usage of extra input and output channels. They also learn filter shapes using this regularization.