Keynote 1 Homework
Aggregazione dei criteri
3. Operate a pairwise layer comparison for each layer in the architecture at least for 2 parameters sets (i.e., 2 networks trained from different initialization)
a. Use both CKA and SVCCA
4. (§§§) (extra 2) Fix the
Aperto: lunedì, 29 marzo 2021, 00:00
As discussed during the lecture, given the difficulty of this homework, there is no imminent due date.
If you wish to do this homework, we only ask to submit your implementations ~7 days before your final examination.
NB: (§§§) indicates a hard exercise, (§§) a moderately hard exercise
Reproduce the "Sanity Check for Similarity Indexes" from page 6 of Similarity of Neural Network Representations Revisited) for the case of Multilayer Perceptrons (MLPs):
1. Start from a MLP with an architecture of your choice.
a. (extra 1) The architecture must be such that it reaches more than 98% of test-set accuracy on average
i. Test that the threshold is reached with an appropriate test statistic
2. (§§) Build a function to extract representations from each layer after the application of its activation function (e.g. 5 layers w/ relu + output layer: extract representation after ReLU for the 5 hidden layers + representation for the output)
i. Test that the threshold is reached with an appropriate test statistic
2. (§§) Build a function to extract representations from each layer after the application of its activation function (e.g. 5 layers w/ relu + output layer: extract representation after ReLU for the 5 hidden layers + representation for the output)
3. Operate a pairwise layer comparison for each layer in the architecture at least for 2 parameters sets (i.e., 2 networks trained from different initialization)
a. Use both CKA and SVCCA
4. (§§§) (extra 2) Fix the
layer_sim
library such that it is possible to retain the gradient of the representations (you'll need to call `backward` inside the routine for building representations [or call it outside, it's up to you to find a suitable method to obtain it]. You can do it either on the loss or check Similarity of Neural Networks with Gradients for additional tricks) and implement CKA with the incorporation of gradient flow. a. See how this metric compares to *vanilla* CKA