A/B Test Calculator

Determine if your A/B test results are statistically significant. Enter visitors and conversions for both control and variant groups to get instant statistical analysis including p-value, z-score, uplift, and power.

Z-test for two proportions: Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂))

Control Group

Variant Group

Frequently Asked Questions

What is an A/B test?

An A/B test (also called a split test) is a controlled experiment where you compare two versions of something (e.g., a webpage, email, or ad) to determine which performs better. Version A is the control (original), and Version B is the variant (modified). Users are randomly assigned to each group, and their behavior (conversions, clicks, etc.) is measured.

What is statistical significance in A/B testing?

Statistical significance means the difference between your control and variant is unlikely to be due to random chance. Typically, a result is considered significant at 95% confidence, meaning there's less than a 5% probability the observed difference happened by chance. The p-value quantifies this probability.

How do you calculate the p-value for an A/B test?

The p-value is calculated using a two-proportion z-test. First, compute the z-score: Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂)), where p̂ is the pooled proportion. Then convert the z-score to a two-tailed p-value using the standard normal distribution.

What confidence level should I use?

95% confidence is the industry standard for most A/B tests. Use 90% for directional decisions or fast-paced experiments where speed matters more than certainty. Use 99% for high-stakes decisions (pricing changes, major redesigns) where a false positive would be very costly.

What is statistical power?

Statistical power is the probability of detecting a true effect when one exists. A power of 80% means if there really is a difference between your variations, you have an 80% chance of detecting it. Low power means you might miss real improvements (false negatives). Most experiments should target at least 80% power.

How long should I run an A/B test?

Run your test until you reach the required sample size (use our Sample Size Calculator to determine this). Never stop a test early just because it looks significant — this inflates false positive rates. Also run for at least 1-2 full business cycles (typically 1-2 weeks) to account for day-of-week effects.

What does the uplift percentage mean?

Uplift (or lift) is the relative improvement of the variant over the control. It's calculated as: Uplift = (Variant Rate - Control Rate) / Control Rate × 100. For example, if control converts at 5% and variant at 6%, the uplift is 20% — meaning the variant performs 20% better than the control.

Can I trust my A/B test results with a small sample size?

Small sample sizes lead to unreliable results with wide confidence intervals. Even if you see a 'significant' p-value with small samples, the observed effect size is likely exaggerated. Aim for adequate sample sizes before drawing conclusions. Use our A/B Test Sample Size Calculator to plan your experiment.

What is A/B Testing?

A/B testing (also known as split testing) is a method of comparing two versions of a webpage, email, ad, or any other content to determine which one performs better. Users are randomly divided into two groups: the control group (A) sees the original version, and the variant group (B) sees the modified version.

The key question A/B testing answers is: "Is the difference in performance between A and B real, or could it have happened by random chance?" This is where statistical significance comes in. Our calculator uses a two-proportion z-test to determine whether the observed difference is statistically significant.

A/B testing is fundamental to data-driven decision making in marketing, product development, UX design, and growth engineering. Companies like Google, Amazon, Netflix, and Booking.com run thousands of A/B tests annually to optimize their products.

Statistical Formula & How It Works

This calculator uses the two-proportion z-test to compare conversion rates between two independent groups:

Step 1: Calculate Conversion Rates

p₁ = Conversions₁ / Visitors₁
p₂ = Conversions₂ / Visitors₂

Step 2: Calculate Pooled Proportion

p̂ = (C₁ + C₂) / (n₁ + n₂)

Step 3: Calculate Z-Score

Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂))

Step 4: Convert to P-Value (two-tailed)

p-value = 2 × (1 - Φ(|Z|))

If the p-value is less than alpha (where alpha = 1 - confidence level), the result is statistically significant. For 95% confidence, alpha = 0.05, so a p-value below 0.05 indicates significance.

A/B Test Calculation Examples

Example 1: Significant Result

An e-commerce site tests a new checkout page. Control: 10,000 visitors, 300 purchases. Variant: 10,000 visitors, 380 purchases.

Control rate: 300/10,000 = 3.00%
Variant rate: 380/10,000 = 3.80%
Uplift: (3.80 - 3.00) / 3.00 = +26.67%
Pooled: 680/20,000 = 3.40%
SE = √(0.034 × 0.966 × 0.0002) = 0.00256
Z = 0.008 / 0.00256 = 3.125
P-value = 0.0018
Result: Statistically significant at 95% confidence

Example 2: Not Significant

A SaaS company tests a new pricing page. Control: 500 visitors, 25 signups. Variant: 500 visitors, 30 signups.

Control rate: 25/500 = 5.00%
Variant rate: 30/500 = 6.00%
Uplift: +20.00%
Z = 0.668
P-value = 0.504
Result: Not significant — need more data

Example 3: Variant Performs Worse

An email marketing test. Control: 5,000 recipients, 250 clicks. Variant: 5,000 recipients, 200 clicks.

Control rate: 250/5,000 = 5.00%
Variant rate: 200/5,000 = 4.00%
Uplift: -20.00%
Z = -2.356
P-value = 0.0185
Result: Statistically significant — variant is worse

Choosing Your Significance Level

ConfidenceAlpha (α)Z CriticalBest For
90%0.101.645Quick iterations, low-risk changes
95%0.051.960Industry standard, most A/B tests
99%0.012.576High-stakes decisions, pricing changes

Common A/B Testing Mistakes

  1. Stopping tests too early — Checking results before reaching the required sample size inflates false positive rates. Commit to your sample size before starting.
  2. Testing too many variations — Each additional variant requires a larger sample size and increases the chance of false positives (multiple comparisons problem).
  3. Ignoring statistical power — Low-powered tests frequently miss real effects. Aim for at least 80% power when planning your test.
  4. Not running full business cycles — User behavior varies by day of week, time of day, and season. Run tests for at least 1-2 full weeks.
  5. Testing tiny changes on small samples — Small effects need large samples to detect. Use the Sample Size Calculator to plan ahead.
  6. Cherry-picking metrics — Decide which metric to track before running the test. Looking at multiple metrics after the fact increases false discoveries.

When to Use A/B Testing

  • Landing page optimization — Headlines, CTAs, images, form fields, layout
  • Email marketing — Subject lines, send times, content, personalization
  • Pricing pages — Pricing tiers, feature display, social proof
  • Ad campaigns — Ad copy, creatives, targeting, bidding strategies
  • Product features — Onboarding flows, UI changes, feature placement
  • Checkout flows — Form design, payment options, trust signals

A/B Testing Best Practices

  1. Define your hypothesis before testing — Write down what you expect to happen and why. This prevents post-hoc rationalization.
  2. Calculate sample size upfront — Use our A/B Test Sample Size Calculator to determine how many visitors you need before starting.
  3. Test one variable at a time — Changing multiple elements makes it impossible to know which change caused the effect.
  4. Ensure random assignment — Users should be randomly assigned to control or variant with equal probability.
  5. Run the full duration— Don't stop early. Don't extend the test just because results aren't significant.
  6. Consider practical significance — A statistically significant 0.1% improvement may not be worth the development cost. Consider the business impact.
  7. Document everything — Record your hypothesis, sample size calculation, test duration, and results for institutional learning.

Related Calculators