You are correct. For commercial use, the GPUs used for training and fine-tuning aren't a problem financially. However, if we wanted to rigorously benchmark AlphaChip against simulated annealing or other floorplanning algorithms, we have to afford the same compute and runtime budget to each algorithm. With 16 GPUs running for 6 hours, you could explore a huge placement space using any algorithm, and it isn't clear if RL will outperform the other ones. Furthermore, the runtime of AlphaChip as shown in the Nature paper and ISPD was still significantly greater than Cadence's concurrent macro placer (even after pre-training, RL requires several hours of fine-tuning on the target problem instance). Arguably, the runtime could go down with more GPUs, but at this point, it is unclear how much value is coming from the policy network / problem embedding vs the ability to explore many potential placements.
> However, if we wanted to rigorously benchmark AlphaChip against simulated annealing or other floorplanning algorithms, we have to afford the same compute and runtime budget to each algorithm.
The Nature authors already presented such a study in their appendix:
"To make comparisons fair, we ran 80 SA experiments sweeping different hyperparameters, including maximum temperature (10^−5, 3 × 10^−5, 5 × 10^−5, 7 × 10^-5, 10^−4, 2 × 10^−4, 5 × 10^−4, 10^−3), maximum SA episode length (5 × 10^4, 10^5) and seed (five different random seeds), and report the best results in terms of proxy wirelength and congestion costs in Extended Data Table 6"
You're saying that if the other methods were given the equivalent amount of compute they might be able to perform as well as AlphaChip? Or at least that the comparison would be fairer?
Existing mixed-placement algorithms depend on hyperparameters, heuristics, and initial states / randomness. If afforded more compute resources, they can explore a much wider space and in theory come up with better solutions. Some algorithms like simulated annealing are easy to modify to exploit arbitrarily more compute resources. Indeed, I believe the comparison of AlphaChip to alternatives would be fairer if compute resources and allowed runtime were matched.
In fact, existing algorithms such as naive simulated annealing can be easily augmented with ML (e.g. using state embeddings to optimize hyperparameters for a given problem instance, or using a regression model to fine-tune proxy costs to better correlate with final QoR). Indeed, I strongly suspect commercial CAD software is already applying ML in many ways for mixed-placement and other CAD algorithms. The criticism against AlphaChip isn't about rejecting any application of ML to EDA CAD algorithms, but rather the particular formulation they used and objections to their reported results / comparisons.
That sounds like future work for simulated annealing fans to engage in, quite honestly, rather than something that needs to be addressed immediately in a paper proposing an alternative method. The proposed method accomplished what it set out to do, surpassing current methods; others are free to explore different hyperparameters to surpass the quality again... This is, ultimately, why we build benchmark tasks: if you want to prove you know how to do it better, one is free to just go do it better instead of whining about what the competition did or didn't try on one's behalf.
Yes, they are. The other approaches usually look like simulated annealing, which has several hyperparameters that control how much computing is used and improve results with more compute usage.
I understand and have read the article. Running 80 experiments with a crude form of simulated annealing is at most 0.0000000001% of the effort that has been spent on making that kind of hill climb work well by traditional EDA vendors. That is also an in-sample comparison, where I would believe the Google thing pre-trained on Google chips would do well, while it might have a harder time with a chip designed by a third party (further from its pre-training).
The modern versions of that hill climb also use some RL (placing and routing chips is sort of like a game), but not in the way Jeff Dean wants it to be done.
The comparison in that paper was very much not fair to Google's method. Google's original published comparison to simulated annealing is not fair to simulated annealing methods. That is, unfortunately, part of the game of publication when you want to publish a marginal result.
It is possible that the pre-training step may overfit to a particular class of chips or may fail to converge given a general sample of chip designs. That would make the pre-training step unable to be used in the setting of a commercial EDA tool. The people who do know this are the people at EDA companies who are smart and not arrogant and who benchmarked this stuff before deciding not to adopt it.
If you want to make a good-faith assumption (that IMO is unwarranted given the rest of the paper), the people trying to replicate Google's paper may have done a pre-training step that failed to converge, and then didn't report it. That failure to converge could be due to ineptitude, but it could be due to data quality, too.
The Google internal paper by Chatterjee and the Cheng et al paper from UCSD made such comparisons with Simulated Annealing. The annealer in the Nature paper was easy to improve. When given the same time budget, the improved annealer produced better solutions than AlphaChip. When you give both more time, SA remains ahead. Just read published papers.
The UCSD paper didn't run the Nature method correctly, so I don't see how you can draw this conclusion.
From Jeff's tweet:
"In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article), robbing our learning-based method of its ability to learn from other chip designs, then used 20X less compute and did not train to convergence, preventing our method from fully learning even on the chip design being placed."
As for Chatterjee's paper, "We provided the committee with one-line scripts that generated significantly better RL results than those reported in Markov et al., outperforming their “stronger” simulated annealing baseline. We still do not know how Markov and his collaborators produced the numbers in their paper."