A/B Test Calculator

Determine if your A/B test results are statistically significant. Enter visitors and conversions for both control and variant groups to get instant statistical analysis including p-value, z-score, uplift, and power.

Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂))

Control Group (A)

Variant Group (B)

Frequently Asked Questions

What is an A/B test?

An A/B test (also called a split test) is a controlled experiment where you compare two versions of something (e.g., a webpage, email, or ad) to determine which performs better. Version A is the control (original), and Version B is the variant (modified). Users are randomly assigned to each group, and their behavior (conversions, clicks, etc.) is measured.

What is statistical significance in A/B testing?

Statistical significance means the difference between your control and variant is unlikely to be due to random chance. Typically, a result is considered significant at 95% confidence, meaning there's less than a 5% probability the observed difference happened by chance. The p-value quantifies this probability.

How do you calculate the p-value for an A/B test?

The p-value is calculated using a two-proportion z-test. First, compute the z-score: Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂)), where p̂ is the pooled proportion. Then convert the z-score to a two-tailed p-value using the standard normal distribution.

What confidence level should I use?

95% confidence is the industry standard for most A/B tests. Use 90% for directional decisions or fast-paced experiments where speed matters more than certainty. Use 99% for high-stakes decisions (pricing changes, major redesigns) where a false positive would be very costly.

What is statistical power?

Statistical power is the probability of detecting a true effect when one exists. A power of 80% means if there really is a difference between your variations, you have an 80% chance of detecting it. Low power means you might miss real improvements (false negatives). Most experiments should target at least 80% power.

How long should I run an A/B test?

Run your test until you reach the required sample size (use our Sample Size Calculator to determine this). Never stop a test early just because it looks significant — this inflates false positive rates. Also run for at least 1-2 full business cycles (typically 1-2 weeks) to account for day-of-week effects.

What does the uplift percentage mean?

Uplift (or lift) is the relative improvement of the variant over the control. It's calculated as: Uplift = (Variant Rate - Control Rate) / Control Rate × 100. For example, if control converts at 5% and variant at 6%, the uplift is 20% — meaning the variant performs 20% better than the control.

Can I trust my A/B test results with a small sample size?

Small sample sizes lead to unreliable results with wide confidence intervals. Even if you see a 'significant' p-value with small samples, the observed effect size is likely exaggerated. Aim for adequate sample sizes before drawing conclusions. Use our A/B Test Sample Size Calculator to plan your experiment.

What is A/B Testing?

A/B testing (also known as split testing) is a method of comparing two versions of a webpage, email, ad, or any other content to determine which one performs better. Users are randomly divided into two groups: the control group (A) sees the original version, and the variant group (B) sees the modified version.

The key question A/B testing answers is: "Is the difference in performance between A and B real, or could it have happened by random chance?" This is where statistical significance comes in. Our calculator uses a two-proportion z-test to determine whether the observed difference is statistically significant.

A/B testing is fundamental to data-driven decision making in marketing, product development, UX design, and growth engineering. Companies like Google, Amazon, Netflix, and Booking.com run thousands of A/B tests annually to optimize their products.

Statistical Formula & How It Works

This calculator uses the two-proportion z-test to compare conversion rates between two independent groups:

Step 1: Calculate Conversion Rates

p₁ = Conversions₁ / Visitors₁
p₂ = Conversions₂ / Visitors₂

Step 2: Calculate Pooled Proportion

p̂ = (C₁ + C₂) / (n₁ + n₂)

Step 3: Calculate Z-Score

Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂))

Step 4: Convert to P-Value (two-tailed)

p-value = 2 × (1 - Φ(|Z|))

If the p-value is less than alpha (where alpha = 1 - confidence level), the result is statistically significant. For 95% confidence, alpha = 0.05, so a p-value below 0.05 indicates significance.

A/B Test Calculation Examples

Example 1: Significant Result

An e-commerce site tests a new checkout page. Control: 10,000 visitors, 300 purchases. Variant: 10,000 visitors, 380 purchases.

Control rate: 300/10,000 = 3.00%
Variant rate: 380/10,000 = 3.80%
Uplift: (3.80 - 3.00) / 3.00 = +26.67%
Pooled: 680/20,000 = 3.40%
SE = √(0.034 × 0.966 × 0.0002) = 0.00256
Z = 0.008 / 0.00256 = 3.125
P-value = 0.0018
Result: Statistically significant at 95% confidence

Example 2: Not Significant

A SaaS company tests a new pricing page. Control: 500 visitors, 25 signups. Variant: 500 visitors, 30 signups.

Control rate: 25/500 = 5.00%
Variant rate: 30/500 = 6.00%
Uplift: +20.00%
Z = 0.668
P-value = 0.504
Result: Not significant — need more data

Example 3: Variant Performs Worse

An email marketing test. Control: 5,000 recipients, 250 clicks. Variant: 5,000 recipients, 200 clicks.

Control rate: 250/5,000 = 5.00%
Variant rate: 200/5,000 = 4.00%
Uplift: -20.00%
Z = -2.356
P-value = 0.0185
Result: Statistically significant — variant is worse

Choosing Your Significance Level

ConfidenceAlpha (α)Z CriticalBest For
90%0.101.645Quick iterations, low-risk changes
95%0.051.960Industry standard, most A/B tests
99%0.012.576High-stakes decisions, pricing changes

Common A/B Testing Mistakes

  1. Stopping tests too early — Checking results before reaching the required sample size inflates false positive rates. Commit to your sample size before starting.
  2. Testing too many variations — Each additional variant requires a larger sample size and increases the chance of false positives (multiple comparisons problem).
  3. Ignoring statistical power — Low-powered tests frequently miss real effects. Aim for at least 80% power when planning your test.
  4. Not running full business cycles — User behavior varies by day of week, time of day, and season. Run tests for at least 1-2 full weeks.
  5. Testing tiny changes on small samples — Small effects need large samples to detect. Use the Sample Size Calculator to plan ahead.
  6. Cherry-picking metrics — Decide which metric to track before running the test. Looking at multiple metrics after the fact increases false discoveries.

When to Use A/B Testing

  • Landing page optimization — Headlines, CTAs, images, form fields, layout
  • Email marketing — Subject lines, send times, content, personalization
  • Pricing pages — Pricing tiers, feature display, social proof
  • Ad campaigns — Ad copy, creatives, targeting, bidding strategies
  • Product features — Onboarding flows, UI changes, feature placement
  • Checkout flows — Form design, payment options, trust signals

A/B Testing Best Practices

  1. Define your hypothesis before testing — Write down what you expect to happen and why. This prevents post-hoc rationalization.
  2. Calculate sample size upfront — Use our A/B Test Sample Size Calculator to determine how many visitors you need before starting.
  3. Test one variable at a time — Changing multiple elements makes it impossible to know which change caused the effect.
  4. Ensure random assignment — Users should be randomly assigned to control or variant with equal probability.
  5. Run the full duration — Don't stop early. Don't extend the test just because results aren't significant.
  6. Consider practical significance — A statistically significant 0.1% improvement may not be worth the development cost. Consider the business impact.
  7. Document everything — Record your hypothesis, sample size calculation, test duration, and results for institutional learning.

Related Calculators