Jump to content

Deep Learning: Difference between revisions

1,735 bytes added ,  24 November 2020
Line 1,919: Line 1,919:
;Architecture
;Architecture
Stack encoders.
Stack encoders.
==Interpretability==
;Interpretability Methods
* Built-in model interpretability
* Feature level interpretability
* Instance based explanations
We will focus on feature level interpretability.
===Feature Level Interpretability===
These are given through saliency maps. 
* Perturbation-based: Perturb the input to get another output and compute the difference. 
* Gradient-based
===Gradient-based Methods===
Take the derivative of the output with respect to the input.
;Limitations
* Too local and sensitive to slight perturbations
* Saturated outputs lead to unintuitive gradients
* Discontinuous gradients are problematic
;SmoothGrad
* Add gaussian noise to input and average the gradient.
;Integrated Gradients
* Average the gradients along path from baseline to input.
;DeepLift
* We don't care about gradient but the slope relative to the ''reference'' state
;Limitations
* Models must be able to compute the gradient of the output with respect to the input
* Interpretation of neural networks is fragile
** Saliency maps are uninterpretable for adversarial examples on clean models and adversarially trained models.
* Needs white-box gradient access to the model.
===Evaluation of interpretability methods===
* Human evaluation
** Can humans evaluate saliency?
* Accuracy drop after removing ''salient'' features
* Sanity checks
** Model parameter randomization test - compare output of saliency method on trained vs untrained method to make sure saliency depends on model parameters.
* Synthetic Data
* Data randomization test
** Train on random labels and see if saliency depends on relationship between input & output.
Temporal saliency Rescaling
* If you remove this feature, how is the gradient going to change.


==Misc==
==Misc==