How long does it take to download Llama2 70B? On the 4x 25 Gbps NICs that aws.p4de's have, it should take ~10s. Yet in production we've observed much higher times, which makes autoscaling less responsive + more expensive. This blog post shows how we've reduced download & init time from 4m25s to 20s, using techniques such as streaming S3->CPU->GPU and multiple TCP streams.
"We believe that Ray will continue to play an increasingly important role in bringing much needed common infrastructure and standardization to the production machine learning ecosystem, both within Uber and the industry at large."
Poll: if you are writing a ML library, do you want to try Ray as the distributed runtime? If not, what else?