"Leela Chess Zero is losing with quite a gap" Check https://tcec-chess.com/ ATM ...

knuthsat · on Jan 13, 2021

Looks like the tournament is still ongoing. I was referring to the last few.

https://en.wikipedia.org/wiki/TCEC_Season_18

https://en.wikipedia.org/wiki/TCEC_Season_19

kohlerm · on Jan 13, 2021

That being said, assuming you can fully saturate multiple GPUs then the SF approach has the disadvantage that it cannot use the performance improvements for GPUs/TPUs which still are growing fast, whereas CPU performance only grows slowly these days.

stabbles · on Jan 13, 2021

The Stockfish mentality is: not everybody owns a GPU and Stockfish should be available for everybody and perform well [1]. So they went for a CPU micro-arch optimized neural net, which is great. Maybe this is ultimately not the best for tournaments.

What I would find interesting is if they could give engines an energy budget instead of a time limit. Maybe that makes CPU vs GPU games more interesting & fair.

[1] https://github.com/official-stockfish/Stockfish/issues/2823

kohlerm · on Jan 13, 2021

IMHO they did this because that was the only incremental way to improve strength. They always relied an a fast search with a relatively simplistic eval function. For this approach alpha beta search works well. What is not clear whether it works well for the Leela approach. All attempts so far at least have not been successful(could be for other reasons, such as training approach). SF strength seems to be their well optimized search function. Rewriting that would IMHO be equivalent to creating a new chess engine. That being said I still think that the success of SF NNUE is very remarkable.

yters · on Jan 13, 2021

They should also give an energy budget of human vs computer.

confuseshrink · on Jan 13, 2021

Interesting point. Nvidia have been improving the int performance for quantized inference on their GPUs a lot. It might be a lot of work but could it be possible to scale up this NNUE approach to the point where it would be worthwhile to run on a GPU?

For single-input "batches" (seems like this is what's being used now?) it might never be worthwhile but perhaps if multiple positions could be searched in parallel and the NN evaluation batched this might start to look tempting?

Not sure what the effect of running PVS with multiple parallel search threads is. Presumably the payoff of searching with less information means you reach the performance ceiling quite a lot quicker than MCTS-like searches as the pruning is a lot more sensitive to having up-to-date information about the principal variation.

Disclaimer: My understanding of PVS is very limited.

kohlerm · on Jan 13, 2021

Sure if someone can come up with an approach to run an NNUE (efficiently updatable) network on GPUs that might really be another breakthrough. But at a first glance it looks to me that this could be very difficult. Because AFAIK the SF search is relatively complicated. Even for Leela implementing an efficient batched search on multiple GPUs seems to be difficult (some improvements coming with Ceres). And Leela is using a much simple MCTS search. That doesn't that Leela's search could not be improved. It does not give higher priority necessarily for forced sequence of moves (at least not explicitly) or high risk moves. Which is IMHO why sometimes she does not see relatively simple tactics.

fho · on Jan 14, 2021

I guess the simplest approach to port NNUE to GPUs would be to run a complete instance per GPU thread (ie concurrent, not parallel evaluation).

nl · on Jan 13, 2021

Generally the GPU vs CPU gap on the evaluation side isn't nearly as big as in training. In theory the gap should remain large, but in practice things like the inability to have data ready for batch evaluation means it is harder to saturate the GPU (which as you note is important).

But I'm not sure how much of this applies to chess engines. I see some comments noting that the search part makes it hard to generate batches, but I'm not an expert in this.