I'm going to stick with multiarm bandit testing.

nxpnsv · on June 26, 2021

This is a much better approach...

gingerlime · on June 26, 2021

What tools/frameworks are you using for running and analysing results?

eximius · on June 27, 2021

At my previous job when it was relevant, I wrote something in house.

A higher order component would pick which variant at runtime (cause we had problems with SSR, or that would be more appropriate). Cached the picks in cookies.

In house charting and probability calculations to determine what P(X>Y) was for each experiment pair. Then we'd just manually prune them occasionally (since the bad ones weren't being displayed, timeliness didn't much matter). Periodically re-introduce old variants by hand if we thought it was worth it.