You need between 1,000 and 500,000 visitors per variation for a statistically valid A/B test, depending on your baseline conversion rate and the size of improvement you want to detect. A typical ecommerce site with a 3% conversion rate needs approximately 25,000 visitors per variation to detect a 0.5 percentage point change at 95% confidence. Use the calculator below to find your exact sample size.

Key Takeaways

  • Sample Size Formula: Required visitors increase exponentially as the detectable effect gets smaller — detecting a 0.1% change requires 25x more visitors than detecting a 1% change
  • 95% Confidence Standard: Use 95% significance and 80% power as your default settings — this is the industry standard for most business decisions
  • Minimum 2 Weeks: Always run tests for at least 14 days regardless of sample size to account for day-of-week variation in visitor behaviour
  • Stopping Early Invalidates Results: Ending a test before reaching full sample size inflates false positive rates from 5% to over 30%
  • Both Variations Need Equal Traffic: Split traffic 50/50 between control and variant for maximum statistical efficiency
A/B Test Sample Size Calculator





[TOOL]

Why Does Sample Size Matter for A/B Testing?

Sample size determines whether your A/B test results are statistically reliable or just random noise that leads to wrong decisions.

Every website has natural conversion rate fluctuations. On any given day, your conversion rate might be 2.8% or 3.2% even with no changes to your site. An A/B test needs enough data to distinguish real improvements from this background noise. Without adequate sample size, you are flipping a coin and calling it data.

Running tests with insufficient sample sizes is the most expensive mistake in conversion optimisation. You implement a “winning” variation that was actually just statistical noise, your conversion rate stays flat or drops, and you waste weeks of engineering time. A properly sized test costs the same amount of effort but delivers results you can trust.

The calculator above uses the standard two-proportion z-test formula used by Google Optimize, Optimizely, and VWO. Enter your current conversion rate and the minimum improvement worth detecting, and it tells you exactly how many visitors you need before making a decision.

How Do You Calculate A/B Test Sample Size?

The sample size formula uses four inputs: baseline conversion rate, minimum detectable effect, significance level, and statistical power.

The formula is: n = (Z_alpha + Z_beta)^2 x (p1(1-p1) + p2(1-p2)) / (p2 – p1)^2. Here, p1 is your current conversion rate, p2 is the new rate you want to detect, Z_alpha is determined by your confidence level (1.96 for 95%), and Z_beta is determined by your power level (0.84 for 80%).

In practical terms: a 3% baseline conversion rate with a 0.5 percentage point minimum detectable effect (detecting a change to 3.5%) at 95% confidence and 80% power requires approximately 23,500 visitors per variation. That is 47,000 total visitors split evenly between your control and variant pages.

The minimum detectable effect (MDE) has the largest impact on sample size. Cutting your MDE in half roughly quadruples the required sample size. This is why you should set your MDE to the smallest change that would actually matter to your business, not the smallest change you are curious about.

What Significance Level Should You Use?

Use 95% significance (also called a 5% alpha or p-value threshold of 0.05) for most A/B tests — this is the universal standard across the industry.

The significance level controls your false positive rate. At 95% confidence, there is a 5% chance your test declares a winner when no real difference exists. At 90%, that false positive risk doubles to 10%. At 99%, it drops to 1% but requires 65% more sample size.

Use 90% confidence only for low-stakes decisions where speed matters more than certainty — such as testing button colours or minor copy changes that are easily reversible. Use 99% confidence for high-stakes changes like pricing page redesigns, checkout flow overhauls, or any change affecting revenue directly.

Statistical power is the flip side of significance. While significance controls false positives, power controls false negatives — the chance you miss a real improvement. The standard is 80% power, meaning a 20% chance of failing to detect a real effect. Increase to 90% power when the cost of missing an improvement is high, such as testing a change that took months to develop.

How Long Should an A/B Test Run?

Run every A/B test for a minimum of 14 days, even if you reach your required sample size sooner.

The two-week minimum exists because website traffic patterns vary by day of week. Monday visitors behave differently from Saturday visitors. If you run a test for 5 days (Monday to Friday), your results exclude weekend behaviour entirely. A “winner” that performed well on weekdays might lose on weekends, giving you misleading overall results.

Calculate your expected duration by dividing total required sample size by daily visitors. If you need 50,000 total visitors and your site gets 2,000 per day, the test needs 25 days. If the math says 10 days, extend to 14 days anyway for the day-of-week coverage.

Set a maximum test duration of 8 weeks. Tests running longer than 8 weeks risk external validity threats — seasonal changes, competitor actions, or market shifts can contaminate your results. If you cannot reach adequate sample size within 8 weeks, you need to either increase your MDE or find a higher-traffic page to test on.

What Is Minimum Detectable Effect and How Do You Choose It?

Minimum detectable effect (MDE) is the smallest conversion rate improvement your test is designed to reliably detect.

Setting MDE correctly is the most important decision in test planning. Set it too small and you need millions of visitors. Set it too large and you miss meaningful improvements. The right MDE is the smallest change that justifies the effort of implementing the winning variation.

For most ecommerce sites, a 0.5 percentage point MDE (e.g., detecting a change from 3.0% to 3.5%) is practical. This represents a 17% relative improvement — significant enough to impact revenue. For a site generating £100,000 monthly revenue, a 0.5 point conversion increase adds approximately £16,700 per month.

High-traffic sites (over 100,000 daily visitors) can afford smaller MDEs of 0.1-0.3 percentage points. Low-traffic sites (under 5,000 daily visitors) should use larger MDEs of 1-2 percentage points and focus on testing bold changes — incremental tweaks require sample sizes they cannot achieve in a reasonable timeframe.

What Happens If You Stop an A/B Test Early?

Stopping a test before reaching full sample size inflates your false positive rate from 5% to 20-30%, making “winning” results unreliable.

This is called the “peeking problem” and it is the most common A/B testing mistake. You check results after 3 days, see variation B converting 15% better with a green “significant” badge, and call it a winner. The problem: early results are highly volatile. Small samples amplify random variation, creating the illusion of large effects that disappear as more data arrives.

Evan Miller’s research demonstrated that stopping a nominally 95%-confident test whenever it first shows significance produces false positives 26% of the time — five times the expected 5% rate. Your testing tool shows “95% confidence” but the actual confidence is closer to 74%.

If you must check results before completion, use sequential testing methods (also called “always valid” p-values). Tools like Optimizely and Google Optimize use these methods by default. Alternatively, pre-commit to checking results only at predetermined intervals — 25%, 50%, 75%, and 100% of sample size — and apply a Bonferroni correction to account for multiple checks.

When Should You Not Run an A/B Test?

Skip A/B testing when you have fewer than 1,000 monthly conversions, when the change is a clear fix, or when user research already provides strong directional evidence.

Sites with low traffic cannot reach statistical significance in a reasonable timeframe. If your required test duration exceeds 8 weeks, consider alternative research methods: user testing with 5-10 participants, heatmap analysis, or session recordings. These qualitative methods provide actionable insights without requiring statistical sample sizes.

Obvious improvements do not need testing. If your checkout button is hidden below the fold, move it up. If your page loads in 8 seconds, speed it up. If your form has 15 required fields, reduce them. Testing the obvious wastes time and traffic you could spend testing genuinely uncertain changes.

Use the calculator above to determine whether your specific test is feasible before investing in development. Enter your baseline rate and desired MDE, then check if the required duration fits within your testing roadmap. If it requires more than 8 weeks, increase the MDE or choose a higher-impact test.

Frequently Asked Questions

What is a good sample size for an A/B test?

There is no universal “good” sample size — it depends entirely on your baseline conversion rate and minimum detectable effect. A site with a 10% conversion rate needs approximately 3,800 visitors per variation to detect a 1 percentage point change. A site with a 1% conversion rate needs 15,700 visitors per variation for the same absolute change. Use the calculator above for your specific numbers.

Can I run an A/B test with only 500 visitors?

Not reliably for conversion rate tests. With 500 visitors per variation, you can only detect very large effects — roughly a 5+ percentage point change. This means the test only catches dramatic differences. For headline or engagement tests where the metric has higher variance, 500 visitors may suffice. For revenue and conversion tests, you need significantly more.

How do I increase my A/B test sample size?

Three approaches: increase traffic to the test page through paid or organic channels, test on a higher-traffic page (homepage vs. deep product page), or increase your minimum detectable effect to accept only larger improvements. You can also extend test duration, but avoid exceeding 8 weeks due to external validity concerns.

What is statistical power in A/B testing?

Statistical power is the probability that your test correctly detects a real improvement when one exists. At 80% power (the standard), there is a 20% chance you miss a real effect and incorrectly conclude “no difference.” Higher power requires larger sample sizes but reduces the risk of missing winning variations. Use 90% power for high-stakes tests.

Should I test more than two variations at once?

You can, but each additional variation increases the total sample size needed. A test with 3 variations (A/B/C) requires 50% more total traffic than a standard A/B test. With 4 variations, you need double the traffic. For most sites, sequential A/B tests are more efficient than multivariate tests unless you have over 100,000 daily visitors.

What conversion rate should I use as my baseline?

Use your actual conversion rate from the past 30 days, measured on the specific page you are testing. Do not use site-wide conversion rates unless you are testing a site-wide change. Pull the exact number from Google Analytics or your testing platform. Using an inaccurate baseline produces incorrect sample size calculations.

Does the sample size calculator account for multiple metrics?

No. This calculator sizes your test for a single primary metric. If you plan to analyse multiple metrics (conversion rate, revenue per visitor, bounce rate), you need to apply a multiple comparisons correction. The simplest approach: divide your significance level by the number of metrics (Bonferroni correction). Testing 3 metrics at 95% confidence means using 98.3% confidence per metric.

Is a 90% confidence level acceptable for A/B tests?

For low-risk, easily reversible changes, 90% confidence is acceptable and reduces required sample size by approximately 25% compared to 95%. Examples include testing copy variations, image swaps, or layout tweaks. For pricing, checkout, or registration flow changes, stick with 95% or higher to avoid costly false positives.

Calculate Your Sample Size Before You Test

Every A/B test should start with a sample size calculation, not end with one. Knowing your required sample size before launch lets you set realistic timelines, avoid the peeking problem, and make decisions with genuine confidence rather than false precision.

Use the calculator above to plan your next test. Enter your baseline conversion rate from Google Analytics, set your MDE to the smallest meaningful improvement, and check whether the required duration fits your testing schedule. If it takes longer than 8 weeks, test a bolder change or choose a higher-traffic page.

Need help building a conversion optimisation programme? Contact JI Digital for expert A/B testing strategy and implementation.


Free tool by: John Isaacson, Digital Marketing Strategist

Last Updated: January 2026