In reality no, because nobody will notice that the company is about 10% smaller than it could have been, and you're doing good stuff. However the cost of that 10% change in your company probably exceeds the cost of running 100 tests.
Note that in the real world, after a while you exhaust the easy 5% wins. Then you wind up chasing a lot of 1% and 2% marginal wins, and those tests take a lot longer to run. But even so, if your company is pulling down, say a million per month, a 2% drop in business from getting a 1% test wrong probably exceeds the costs of all of the tests you run per year. So unless extending tests is prohibitively expensive, or there are significant opportunity tests from not being able to run the tests you want, you should go to higher confidence.
In my experience, the primary limiter of how many A/B tests you can run is available traffic.
Let's assume that I'm running the 2% tests you mention. I can either run 60 2% tests at 95% confidence, or (approximately) 30 2% tests at 99.9% confidence, if the math posted elsewhere in this thread is right, over a year, say - that's how much traffic I have. Let's once again assume a third of those pan out.
I'm still not seeing why I'd prefer 10 2% wins (from the 99.9% approach) to 20 2% wins plus 3 2% losses (from the 95% approach). Yes, there are more errors, but overall I end up with a 40% improvement as opposed to a 24% improvement.
On the first point - I may be being dense here. Do you mean a) segmenting traffic, and running A vs B on one segment, and A vs C on another, b) A/B testing, say, headline and CTA on the same page at the same time, or c) testing different parts of the funnel at the same time?
On the second point - this may be an industry difference, but I've never really had any problem coming up with things to A/B test on a landing page, for example. Just off the top of my head, you could test:
- Headlines - at least 5 of them.
- Headline and subhead vs no subhead.
- Font face.
- Color scheme (overall)
- Images - probably want to test 6-10 of them.
- Call To Action (CTA) text.
- Call To Action button vs link
- CTA placement
- Multiple vs single CTAs.
- Long copy vs short.
- Layout. Image on left, right or top? More than one image? Etc. On the average LP I could come up with 10 possible layouts before pausing for breath.
- Testimonials. PLacement, which ones, how long, picture vs audio vs video vs text.
- Video vs image-and-text.
- Ugly vs good-looking.
- Text highlighting, bolding etc - yellow highlighter style vs bold and italic vs nothing.
- Other social proofing elements - which media icons to use, where to place them, etc.
That's at least 50 A/B tests right there, on a single LP. And all of those elements have been shown in one test or another to affect conversion rates.
I mean that you can take the same traffic, using random assignment, and assign it into multiple A/B tests at once. Sure, there may be interaction effects, but they are random and statistically your evaluation of each test is unaffected by the others.
You need to be careful if you believe that there is reason to believe that tests will interact. For instance if you're testing different font colors, and different background colors, the possibility of red text on a red background would be unfair to both tests. But in general if you avoid the obvious stuff, you can do things in parallel. (If you have enough traffic you can analyze for interaction effects, but don't plan on doing that unless you know that you have enough traffic to actually follow the plan.)
Re-reading, I realize that I was as clear as mud here about interaction vs interaction.
The first paragraph is talking about random interaction. So, for instance, version A of test 1 was really good, and version B of test 2 got more A's from test 1 than version A of test 2 did. This gives version B a random boost. As long as things are random, it is OK to completely ignore this type of random interaction from the fact that you are running multiple tests on the same traffic.
The second paragraph is talking about non-random interactions. People who are in version A of test 1 and also in version B of test 2 get a horrible interaction that hurts both of those. If you have reason to believe that you have causal interactions like this, you can't ignore it but have to think things through carefully.
Note that in the real world, after a while you exhaust the easy 5% wins. Then you wind up chasing a lot of 1% and 2% marginal wins, and those tests take a lot longer to run. But even so, if your company is pulling down, say a million per month, a 2% drop in business from getting a 1% test wrong probably exceeds the costs of all of the tests you run per year. So unless extending tests is prohibitively expensive, or there are significant opportunity tests from not being able to run the tests you want, you should go to higher confidence.