There are other mechanisms for dealing with vanishing and exploding gradients. I...

		pseudonom- on Aug 9, 2023 \| parent \| context \| favorite \| on: Llama from scratch, or how to implement a paper wi... There are other mechanisms for dealing with vanishing and exploding gradients. I (maybe wrongly?) think of batch normalization as being most distinctively about fighting internal covariate shift: https://machinelearning.wtf/terms/internal-covariate-shift/