P-Value Calculator
A p-value measures the probability of obtaining results as extreme as the observed data, assuming the null hypothesis is true. Enter a z-score or sample statistics (mean, population mean, standard deviation, sample size) to instantly calculate the p-value and determine statistical significance.
p-value = 2 × (1 − Φ(|Z|))
Frequently Asked Questions
What is a p-value?
A p-value is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true. A small p-value (typically < 0.05) suggests the observed data is unlikely under the null hypothesis, providing evidence to reject it.
How do you calculate a p-value from a z-score?
For a two-tailed test: p = 2 × (1 − Φ(|Z|)), where Φ is the standard normal CDF. For a left-tailed test: p = Φ(Z). For a right-tailed test: p = 1 − Φ(Z). The z-score itself is calculated as Z = (x̄ − μ₀) / (σ / √n) when using sample statistics.
What does it mean when p < 0.05?
When p < 0.05, the result is statistically significant at the 5% significance level (95% confidence). This means that if the null hypothesis were true, there would be less than a 5% probability of observing data this extreme by random chance alone. You can reject the null hypothesis.
Should I use a one-tailed or two-tailed test?
Use a two-tailed test when you hypothesize the mean could differ in either direction (H₁: μ ≠ μ₀). Use a one-tailed test only when you have a strong prior reason to expect a specific direction — left-tailed (H₁: μ < μ₀) or right-tailed (H₁: μ > μ₀). The test type must be decided before data collection.
What significance level (alpha) should I choose?
α = 0.05 is the conventional standard for most research. Use α = 0.10 for exploratory studies where some false positives are acceptable. Use α = 0.01 or lower for high-stakes decisions (medical trials, policy decisions) where false positives are very costly.
What is the difference between a p-value and a confidence interval?
A p-value gives you a yes/no decision about statistical significance at a given alpha. A confidence interval gives you a range of plausible values for the true effect and communicates uncertainty directly. Both are complementary — a 95% confidence interval that excludes zero corresponds to a two-tailed p-value < 0.05.
Can a p-value tell me the probability that the null hypothesis is true?
No — this is a common misconception. The p-value is the probability of the observed data (or more extreme data) given that the null hypothesis is true. It is NOT the probability that the null hypothesis itself is true. To estimate that, you would need Bayesian methods.
What is the difference between a z-test and a t-test?
A z-test is appropriate when the population standard deviation (σ) is known and the sample size is large (n ≥ 30). A t-test is used when σ is unknown and must be estimated from the sample — common in small samples. For large samples, t-test and z-test results converge.
What is a P-Value?
A p-value (probability value) is the probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis is true. In other words, the p-value answers: "If there were no real effect, how likely is it to see data this extreme by random chance?"
A small p-value(typically < 0.05) suggests the observed data is unlikely under the null hypothesis, providing evidence to reject it. A large p-value means the data is consistent with the null hypothesis, so you fail to reject it.
P-values are fundamental in hypothesis testing across statistics, medicine, psychology, economics, and data science. They do not measure the probability that the null hypothesis is true — they measure the probability of the observed data given the null hypothesis.
P-Value Formula from Z-Score
This calculator computes p-values using the standard normal distribution (Z-test). The z-score measures how many standard errors the sample mean is from the population mean:
Z-Score from Sample Statistics
Z = (x̄ − μ₀) / (σ / √n)
x̄ = sample mean | μ₀ = population mean | σ = std dev | n = sample size
Two-Tailed P-Value
p = 2 × (1 − Φ(|Z|))
Left-Tailed P-Value
p = Φ(Z)
Right-Tailed P-Value
p = 1 − Φ(Z)
Where Φ(Z)is the cumulative distribution function (CDF) of the standard normal distribution. The calculator uses the Abramowitz & Stegun approximation (formula 7.1.26) for fast, accurate CDF evaluation.
Common Significance Levels (α)
The significance level (alpha, α) is the threshold below which you reject the null hypothesis. Choosing α before collecting data is essential to avoid p-hacking.
| Alpha (α) | Confidence | Z Critical (Two-Tailed) | Typical Use |
|---|---|---|---|
| 0.10 | 90% | ±1.645 | Exploratory research, low-stakes decisions |
| 0.05 | 95% | ±1.960 | Industry standard, most hypothesis tests |
| 0.01 | 99% | ±2.576 | Medical trials, high-stakes research |
| 0.001 | 99.9% | ±3.291 | Physics (particle discovery), genome-wide studies |
A result is statistically significantwhen p < α. At α = 0.05, you accept a 5% risk of a false positive (Type I error) — incorrectly rejecting a true null hypothesis.
One-Tailed vs Two-Tailed Tests
The choice between one-tailed and two-tailed tests depends on your hypothesis before seeing the data.
Two-Tailed Test
H₀: μ = μ₀ H₁: μ ≠ μ₀
Use when you are testing whether the mean is different from the population mean in either direction. This is the most common choice and the most conservative.
Left-Tailed Test
H₀: μ ≥ μ₀ H₁: μ < μ₀
Use when you specifically hypothesize the mean is less than the population value. The rejection region is in the left tail.
Right-Tailed Test
H₀: μ ≤ μ₀ H₁: μ > μ₀
Use when you specifically hypothesize the mean is greater than the population value. The rejection region is in the right tail.
A two-tailed p-value is exactly double the one-tailed p-value (for the same z-score). Choosing a one-tailed test after seeing results that go in the predicted direction is p-hacking and inflates false-positive rates.
P-Value Calculation Examples
Example 1: Two-Tailed Z-Test (Significant)
A researcher tests whether a new drug changes blood pressure. They observe z = 2.50 with α = 0.05 (two-tailed).
Z = 2.50
p = 2 × (1 − Φ(2.50)) = 2 × 0.00621 = 0.0124
p (0.0124) < α (0.05)
Result: Statistically significant — reject H₀
Example 2: From Sample Statistics
A quality test: sample mean = 105, population mean = 100, σ = 15, n = 36 (two-tailed, α = 0.05).
SE = 15 / √36 = 2.5
Z = (105 − 100) / 2.5 = 2.0
p = 2 × (1 − Φ(2.0)) ≈ 0.0455
p (0.0455) < α (0.05)
Result: Statistically significant — the sample differs from the population mean
Example 3: Right-Tailed (Not Significant)
Testing if a new teaching method improves scores. Z = 1.20, α = 0.05 (right-tailed).
Z = 1.20
p = 1 − Φ(1.20) ≈ 0.1151
p (0.1151) ≥ α (0.05)
Result: Not significant — fail to reject H₀
Common P-Value Mistakes
- Misinterpreting the p-value — A p-value is NOT the probability that the null hypothesis is true. It is the probability of the observed data assuming the null is true.
- P-hacking — Running multiple tests and only reporting the significant ones inflates the false-positive rate. Pre-register your hypothesis and correction method.
- Switching from two-tailed to one-tailed after seeing results halves the p-value and is a form of p-hacking.
- Confusing statistical with practical significance — A tiny p-value with a large sample can indicate a trivially small effect. Always check effect size.
- Ignoring assumptions — Z-tests assume the population standard deviation is known and data is approximately normal. Use a t-test for small samples with unknown σ.
- Treating p = 0.049 and p = 0.051 as categorically different — The 0.05 threshold is a convention, not a hard rule. Report actual p-values and confidence intervals.