Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are other mechanisms for dealing with vanishing and exploding gradients. I (maybe wrongly?) think of batch normalization as being most distinctively about fighting internal covariate shift: https://machinelearning.wtf/terms/internal-covariate-shift/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: