Neural Network Compression: Difference between revisions
No edit summary |
|||
| (One intermediate revision by the same user not shown) | |||
| Line 18: | Line 18: | ||
* Mozer and Smolensky (1988)<ref name="mozer1988skeletonization"></ref> use a gate for each neuron. Then the sensitivity and be estimated with the derivative w.r.t the gate. | * Mozer and Smolensky (1988)<ref name="mozer1988skeletonization"></ref> use a gate for each neuron. Then the sensitivity and be estimated with the derivative w.r.t the gate. | ||
* Karnin<ref name="karnin1990simple"></ref> estimates the sensitivity by monitoring the change in weight during training. | * Karnin<ref name="karnin1990simple"></ref> estimates the sensitivity by monitoring the change in weight during training. | ||
* LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref> which uses the | * LeCun ''e al.'' present ''Optimal Brain Damage'' <ref name="lecun1989optimal"></ref> which uses the second derivative of each weight. | ||
===Redundancy Methods=== | ===Redundancy Methods=== | ||
| Line 24: | Line 24: | ||
===Structured Pruning=== | ===Structured Pruning=== | ||
Structured pruning focuses on keeping the dense structure of the network such that the pruned | Structured pruning focuses on keeping the dense structure of the network such that the pruned network can benefit using standard dense matrix multiplication operations.<br> | ||
This is in contrast to unstructured pruning which zeros out values in the weight matrix but may not necessarilly run faster. | |||
* Wen ''et al.'' (2016) <ref name="wen2016learning"></ref> propose Structured Sparsity Learning (SSL) on CNNs. Given filters of size (N, C, M, K), i.e. (out-channels, in-channels, height, width), they use a group lasso loss/regularization to penalize usage of extra input and output channels. They also learn filter shapes using this regularization. | * Wen ''et al.'' (2016) <ref name="wen2016learning"></ref> propose Structured Sparsity Learning (SSL) on CNNs. Given filters of size (N, C, M, K), i.e. (out-channels, in-channels, height, width), they use a group lasso loss/regularization to penalize usage of extra input and output channels. They also learn filter shapes using this regularization. | ||