A/B 测试计算器

判断您的 A/B 测试结果是否具有统计显著性。输入对照组和变体组的访客数及转化数,即时获得包含 p 值、z 分数、提升率和统计效能的统计分析。

Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂))

Control Group (A)

Variant Group (B)

常见问题

什么是 A/B 测试?

A/B 测试(也称为分割测试)是一种对照实验,用于比较某个内容的两个版本(如网页、电子邮件或广告),以确定哪个表现更好。版本 A 为对照组(原版),版本 B 为变体(修改版)。用户被随机分配到各组,并测量其行为(转化、点击等)。

A/B 测试中的统计显著性是什么?

统计显著性意味着对照组和变体之间的差异不太可能是随机产生的。通常,结果在 95% 置信度下被认为是显著的,即观察到的差异有不到 5% 的概率是偶然发生的。p 值量化了这一概率。

如何计算 A/B 测试的 p 值?

p 值使用双比例 z 检验计算。首先计算 z 分数:Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂)),其中 p̂ 是合并比例。然后使用标准正态分布将 z 分数转换为双尾 p 值。

应该使用什么置信水平?

95% 置信度是大多数 A/B 测试的行业标准。对于方向性决策或速度比确定性更重要的快节奏实验,使用 90%。对于高风险决策(定价变更、重大改版),使用 99%,因为假阳性代价极大。

什么是统计效能?

统计效能是在真实效应存在时检测到它的概率。80% 的效能意味着如果变体之间确实存在差异,你有 80% 的概率检测到它。效能低意味着你可能会错过真实的改进(假阴性)。大多数实验应以至少 80% 的效能为目标。

A/B 测试应该运行多长时间?

测试应运行到达到所需样本量为止(使用样本量计算器确定)。不要仅因为结果看起来显著就提前停止测试——这会使假阳性率虚高。还应至少运行 1-2 个完整的业务周期(通常为 1-2 周),以考虑一周中不同日期的影响。

提升百分比意味着什么?

提升(或增益)是变体相对于对照组的相对改进。计算公式为:提升 = (变体转化率 - 对照转化率) / 对照转化率 × 100。例如,对照组转化率为 5%,变体为 6%,提升率为 20%——意味着变体比对照组表现好 20%。

小样本量的 A/B 测试结果可信吗?

小样本量会导致结果不可靠,置信区间较宽。即使小样本中出现「显著」的 p 值,观察到的效应量也可能被夸大。在得出结论之前,请确保样本量充足。使用 A/B 测试样本量计算器规划实验。

What is A/B Testing?

A/B testing (also known as split testing) is a method of comparing two versions of a webpage, email, ad, or any other content to determine which one performs better. Users are randomly divided into two groups: the control group (A) sees the original version, and the variant group (B) sees the modified version.

The key question A/B testing answers is: "Is the difference in performance between A and B real, or could it have happened by random chance?" This is where statistical significance comes in. Our calculator uses a two-proportion z-test to determine whether the observed difference is statistically significant.

A/B testing is fundamental to data-driven decision making in marketing, product development, UX design, and growth engineering. Companies like Google, Amazon, Netflix, and Booking.com run thousands of A/B tests annually to optimize their products.

Statistical Formula & How It Works

This calculator uses the two-proportion z-test to compare conversion rates between two independent groups:

Step 1: Calculate Conversion Rates

p₁ = Conversions₁ / Visitors₁
p₂ = Conversions₂ / Visitors₂

Step 2: Calculate Pooled Proportion

p̂ = (C₁ + C₂) / (n₁ + n₂)

Step 3: Calculate Z-Score

Z = (p₂ - p₁) / √(p̂ × (1 - p̂) × (1/n₁ + 1/n₂))

Step 4: Convert to P-Value (two-tailed)

p-value = 2 × (1 - Φ(|Z|))

If the p-value is less than alpha (where alpha = 1 - confidence level), the result is statistically significant. For 95% confidence, alpha = 0.05, so a p-value below 0.05 indicates significance.

A/B Test Calculation Examples

Example 1: Significant Result

An e-commerce site tests a new checkout page. Control: 10,000 visitors, 300 purchases. Variant: 10,000 visitors, 380 purchases.

Control rate: 300/10,000 = 3.00%
Variant rate: 380/10,000 = 3.80%
Uplift: (3.80 - 3.00) / 3.00 = +26.67%
Pooled: 680/20,000 = 3.40%
SE = √(0.034 × 0.966 × 0.0002) = 0.00256
Z = 0.008 / 0.00256 = 3.125
P-value = 0.0018
Result: Statistically significant at 95% confidence

Example 2: Not Significant

A SaaS company tests a new pricing page. Control: 500 visitors, 25 signups. Variant: 500 visitors, 30 signups.

Control rate: 25/500 = 5.00%
Variant rate: 30/500 = 6.00%
Uplift: +20.00%
Z = 0.668
P-value = 0.504
Result: Not significant — need more data

Example 3: Variant Performs Worse

An email marketing test. Control: 5,000 recipients, 250 clicks. Variant: 5,000 recipients, 200 clicks.

Control rate: 250/5,000 = 5.00%
Variant rate: 200/5,000 = 4.00%
Uplift: -20.00%
Z = -2.356
P-value = 0.0185
Result: Statistically significant — variant is worse

Choosing Your Significance Level

ConfidenceAlpha (α)Z CriticalBest For
90%0.101.645Quick iterations, low-risk changes
95%0.051.960Industry standard, most A/B tests
99%0.012.576High-stakes decisions, pricing changes

Common A/B Testing Mistakes

  1. Stopping tests too early — Checking results before reaching the required sample size inflates false positive rates. Commit to your sample size before starting.
  2. Testing too many variations — Each additional variant requires a larger sample size and increases the chance of false positives (multiple comparisons problem).
  3. Ignoring statistical power — Low-powered tests frequently miss real effects. Aim for at least 80% power when planning your test.
  4. Not running full business cycles — User behavior varies by day of week, time of day, and season. Run tests for at least 1-2 full weeks.
  5. Testing tiny changes on small samples — Small effects need large samples to detect. Use the Sample Size Calculator to plan ahead.
  6. Cherry-picking metrics — Decide which metric to track before running the test. Looking at multiple metrics after the fact increases false discoveries.

When to Use A/B Testing

  • Landing page optimization — Headlines, CTAs, images, form fields, layout
  • Email marketing — Subject lines, send times, content, personalization
  • Pricing pages — Pricing tiers, feature display, social proof
  • Ad campaigns — Ad copy, creatives, targeting, bidding strategies
  • Product features — Onboarding flows, UI changes, feature placement
  • Checkout flows — Form design, payment options, trust signals

A/B Testing Best Practices

  1. Define your hypothesis before testing — Write down what you expect to happen and why. This prevents post-hoc rationalization.
  2. Calculate sample size upfront — Use our A/B Test Sample Size Calculator to determine how many visitors you need before starting.
  3. Test one variable at a time — Changing multiple elements makes it impossible to know which change caused the effect.
  4. Ensure random assignment — Users should be randomly assigned to control or variant with equal probability.
  5. Run the full duration— Don't stop early. Don't extend the test just because results aren't significant.
  6. Consider practical significance — A statistically significant 0.1% improvement may not be worth the development cost. Consider the business impact.
  7. Document everything — Record your hypothesis, sample size calculation, test duration, and results for institutional learning.

Related Calculators