We can't expect a massive improvement on computer performance anymore. So it wil...

Gigachad · on Nov 6, 2022

They can just add more cores and more power. With modern languages like Rust making multi threading more accessible, I expect we will double down on this. You could also crowd source this, some distributed application where everyone puts their home machines towards training.

PartiallyTyped · on Nov 6, 2022

> You could also crowd source this, some distributed application where everyone puts their home machines towards training.

That's how Leela Chess 0 (LC0) replicated the Alpha-zero performance. In fact, this is actually not that difficult. Assuming you have the means to orchestrate it; all it takes is loading the weights and a batch, computing backprop; and submitting it to the central system which aggregates the gradient updates and then updates the whole network and push new weights (kinda how bitcoin creates a new block).

This is no different to gradient accumulation; just "distributed". In-fact, the system could offload a large number of batches because the returned update is O(1) space to return where n is for batch size; it's just that the O(1) is the size of the network.