Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is currently #1 on the front page, but I think HN is celebrating the defeat of GPUs prematurely. The architectures used in this paper have just one hidden activation layer. It remains to be seen whether the ideas in SLIDE are applicable to general deep learning tasks and architectures.

From the previous paper:

    We choose the standard fully connected neural
    network with one hidden layer of size 128.


Wait... How can 1 Hidden layer be called DEEP neural network? I thought the whole "deep" thing was about having lots of hidden layers.

It's like saying we have a super fast fibonacci algorithm, but we can only show it for n=1.


> Wait... How can 1 Hidden layer be called DEEP neural network?

Yes. They describe a general algorithm, but show results for shallow nets. So it's unclear if this works well in general.


Yep, I feel like anyone realizes when they first start playing around with NNs in a jupyter notebook that you get faster training on the CPU until you level up to bigger networks.


That's normally due to the cost of copying the data to GPU space. Here, they've reformulated the training algorithm to be a less-computationally-complex problem. It's a huge difference.


so you think it was only faster because GPU init overhead?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: