P 值计算器

p 值衡量在原假设成立的条件下，获得与观测数据一样极端结果的概率。输入 z 分数或样本统计量（均值、总体均值、标准差、样本量），即时计算 p 值并判断统计显著性。

Calculation Mode

Test Type

Significance Level (α)

Z-Score

p-value = 2 × (1 − Φ(|Z|))

Related Calculators

A/B Test Calculator A/B Test Sample Size Calculator Five Number Summary Calculator Conversion Rate Calculator

Frequently Asked Questions

什么是p值？

p值是在零假设为真的前提下，观察到当前结果或更极端结果的概率。p值越小，说明在零假设下获得当前数据的概率越低，为拒绝零假设提供更强的证据。p值不是"结果为真的概率"，也不是"研究重要性"的度量。

p值小于0.05意味着什么？

p < 0.05是最常用的统计显著性阈值（α = 0.05），意味着若零假设为真，观察到此结果的概率小于5%。传统上p < 0.05时拒绝零假设，认为结果"统计显著"。但这是人为约定的阈值，不代表实际效应大小或结果的实践意义。

p值的常见误解有哪些？

常见误解：误解1：p值是零假设为真的概率（错误！p值是假设零假设为真时数据出现的概率）；误解2：p < 0.05证明效应真实存在（实际仍有5%假阳性率）；误解3：p值越小效应越大（p值只反映显著性，不反映效应量）；误解4：p > 0.05意味着零假设正确（只说明证据不足以拒绝，而非证明零假设）。

如何解读不同的p值水平？

常用解读标准：p < 0.001：极强统计显著性（***）；p < 0.01：强统计显著性（**）；p < 0.05：统计显著性（*）；0.05 ≤ p < 0.10：边缘显著性（部分领域接受）；p ≥ 0.10：统计不显著。医学临床试验通常要求p < 0.05甚至更严格，粒子物理学要求p < 3×10⁻⁷（5σ标准）。

统计功效（Statistical Power）与p值有何关系？

统计功效（Power）= 1 - β，是当真实效应存在时能正确检测到的概率（即避免假阴性的能力）。功效与p值相关：功效越高，若存在真实效应则越容易得到显著p值；样本量越大，功效越高。大多数研究建议功效≥80%。功效不足时，即使有真实效应也可能因样本太小而得到p > 0.05。

多重比较如何影响p值？

多重比较问题（Multiple Testing Problem）：同时进行多个假设检验时，偶然得到显著p值的概率会累积增加。进行20个独立检验，期望至少1个偶然显著（0.05 × 20 = 1）。校正方法：Bonferroni校正（显著性阈值 = 0.05/检验数）；Benjamini-Hochberg FDR控制；较为保守，应根据研究目的选择合适方法。

贝叶斯方法与频率派p值有何不同？

p值是频率派统计方法的产物，回答："假设H₀为真，数据出现的概率是多少？"贝叶斯方法则计算后验概率，回答："在看到数据后，H₀（或H₁）为真的概率是多少？"贝叶斯因子（Bayes Factor）是贝叶斯替代p值的显著性指标，解释更为直观。现代统计学趋势是结合效应量、置信区间和贝叶斯方法，不过度依赖p值。

p值是如何计算的？

p值计算依赖检验统计量和其理论分布：t检验：计算t统计量，查t分布表（或用软件）得到p值；卡方检验：计算χ²统计量，查χ²分布；F检验（方差分析ANOVA）：计算F统计量，查F分布；非参数检验（Mann-Whitney、Kruskal-Wallis）：基于秩次计算。使用本计算器时，输入检验统计量和自由度，即可自动计算对应的p值。

What is a P-Value?

A p-value (probability value) is the probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis is true. In other words, the p-value answers: "If there were no real effect, how likely is it to see data this extreme by random chance?"

A small p-value(typically < 0.05) suggests the observed data is unlikely under the null hypothesis, providing evidence to reject it. A large p-value means the data is consistent with the null hypothesis, so you fail to reject it.

P-values are fundamental in hypothesis testing across statistics, medicine, psychology, economics, and data science. They do not measure the probability that the null hypothesis is true — they measure the probability of the observed data given the null hypothesis.

P-Value Formula from Z-Score

This calculator computes p-values using the standard normal distribution (Z-test). The z-score measures how many standard errors the sample mean is from the population mean:

Z-Score from Sample Statistics

Z = (x̄ − μ₀) / (σ / √n)

x̄ = sample mean | μ₀ = population mean | σ = std dev | n = sample size

Two-Tailed P-Value

p = 2 × (1 − Φ(|Z|))

Left-Tailed P-Value

p = Φ(Z)

Right-Tailed P-Value

p = 1 − Φ(Z)

Where Φ(Z)is the cumulative distribution function (CDF) of the standard normal distribution. The calculator uses the Abramowitz & Stegun approximation (formula 7.1.26) for fast, accurate CDF evaluation.

Common Significance Levels (α)

The significance level (alpha, α) is the threshold below which you reject the null hypothesis. Choosing α before collecting data is essential to avoid p-hacking.

Alpha (α)	Confidence	Z Critical (Two-Tailed)	Typical Use
0.10	90%	±1.645	Exploratory research, low-stakes decisions
0.05	95%	±1.960	Industry standard, most hypothesis tests
0.01	99%	±2.576	Medical trials, high-stakes research
0.001	99.9%	±3.291	Physics (particle discovery), genome-wide studies

A result is statistically significantwhen p < α. At α = 0.05, you accept a 5% risk of a false positive (Type I error) — incorrectly rejecting a true null hypothesis.

One-Tailed vs Two-Tailed Tests

The choice between one-tailed and two-tailed tests depends on your hypothesis before seeing the data.

Two-Tailed Test

H₀: μ = μ₀ H₁: μ ≠ μ₀

Use when you are testing whether the mean is different from the population mean in either direction. This is the most common choice and the most conservative.

Left-Tailed Test

H₀: μ ≥ μ₀ H₁: μ < μ₀

Use when you specifically hypothesize the mean is less than the population value. The rejection region is in the left tail.

Right-Tailed Test

H₀: μ ≤ μ₀ H₁: μ > μ₀

Use when you specifically hypothesize the mean is greater than the population value. The rejection region is in the right tail.

A two-tailed p-value is exactly double the one-tailed p-value (for the same z-score). Choosing a one-tailed test after seeing results that go in the predicted direction is p-hacking and inflates false-positive rates.

P-Value Calculation Examples

Example 1: Two-Tailed Z-Test (Significant)

A researcher tests whether a new drug changes blood pressure. They observe z = 2.50 with α = 0.05 (two-tailed).

Z = 2.50
p = 2 × (1 − Φ(2.50)) = 2 × 0.00621 = 0.0124
p (0.0124) < α (0.05)
Result: Statistically significant — reject H₀

Example 2: From Sample Statistics

A quality test: sample mean = 105, population mean = 100, σ = 15, n = 36 (two-tailed, α = 0.05).

SE = 15 / √36 = 2.5
Z = (105 − 100) / 2.5 = 2.0
p = 2 × (1 − Φ(2.0)) ≈ 0.0455
p (0.0455) < α (0.05)
Result: Statistically significant — the sample differs from the population mean

Example 3: Right-Tailed (Not Significant)

Testing if a new teaching method improves scores. Z = 1.20, α = 0.05 (right-tailed).

Z = 1.20
p = 1 − Φ(1.20) ≈ 0.1151
p (0.1151) ≥ α (0.05)
Result: Not significant — fail to reject H₀

Common P-Value Mistakes

Misinterpreting the p-value — A p-value is NOT the probability that the null hypothesis is true. It is the probability of the observed data assuming the null is true.
P-hacking — Running multiple tests and only reporting the significant ones inflates the false-positive rate. Pre-register your hypothesis and correction method.
Switching from two-tailed to one-tailed after seeing results halves the p-value and is a form of p-hacking.
Confusing statistical with practical significance — A tiny p-value with a large sample can indicate a trivially small effect. Always check effect size.
Ignoring assumptions — Z-tests assume the population standard deviation is known and data is approximately normal. Use a t-test for small samples with unknown σ.
Treating p = 0.049 and p = 0.051 as categorically different — The 0.05 threshold is a convention, not a hard rule. Report actual p-values and confidence intervals.