5,337
edits
Line 1,919: | Line 1,919: | ||
;Architecture | ;Architecture | ||
Stack encoders. | Stack encoders. | ||
==Interpretability== | |||
;Interpretability Methods | |||
* Built-in model interpretability | |||
* Feature level interpretability | |||
* Instance based explanations | |||
We will focus on feature level interpretability. | |||
===Feature Level Interpretability=== | |||
These are given through saliency maps. | |||
* Perturbation-based: Perturb the input to get another output and compute the difference. | |||
* Gradient-based | |||
===Gradient-based Methods=== | |||
Take the derivative of the output with respect to the input. | |||
;Limitations | |||
* Too local and sensitive to slight perturbations | |||
* Saturated outputs lead to unintuitive gradients | |||
* Discontinuous gradients are problematic | |||
;SmoothGrad | |||
* Add gaussian noise to input and average the gradient. | |||
;Integrated Gradients | |||
* Average the gradients along path from baseline to input. | |||
;DeepLift | |||
* We don't care about gradient but the slope relative to the ''reference'' state | |||
;Limitations | |||
* Models must be able to compute the gradient of the output with respect to the input | |||
* Interpretation of neural networks is fragile | |||
** Saliency maps are uninterpretable for adversarial examples on clean models and adversarially trained models. | |||
* Needs white-box gradient access to the model. | |||
===Evaluation of interpretability methods=== | |||
* Human evaluation | |||
** Can humans evaluate saliency? | |||
* Accuracy drop after removing ''salient'' features | |||
* Sanity checks | |||
** Model parameter randomization test - compare output of saliency method on trained vs untrained method to make sure saliency depends on model parameters. | |||
* Synthetic Data | |||
* Data randomization test | |||
** Train on random labels and see if saliency depends on relationship between input & output. | |||
Temporal saliency Rescaling | |||
* If you remove this feature, how is the gradient going to change. | |||
==Misc== | ==Misc== |