T-Test Calculator

Calculate one-sample and two-sample t-tests for comparing means

About T-Test

One-Sample T-Test: Tests whether the sample mean differs significantly from a hypothesized value (default: 0).

Two-Sample T-Test: Tests whether two sample means differ significantly from each other.

Formula: t = (x̄₁ - x̄₂) / SE

Compare the t-statistic with critical values from the t-distribution table to determine if results are statistically significant.

What Is a T-Test?

The t-test (Student's t-test) is a statistical test used to compare means and determine if differences between groups are statistically significant. It's one of the most widely used tests in statistics because it works when the population standard deviation is unknown—which is almost always the case in real research.

T-Test TypePurposeExample
One-sample t-testCompare sample mean to known valueIs average height different from 170 cm?
Independent two-sampleCompare means of two groupsDo men and women differ in test scores?
Paired t-testCompare two related measurementsDid treatment improve scores (before vs after)?

T-Statistic (One-Sample)

t = (x̄ - μ₀) / (s / √n)

Where:

  • t= T-statistic
  • = Sample mean
  • μ₀= Hypothesized population mean
  • s= Sample standard deviation
  • n= Sample size

One-Sample T-Test

The one-sample t-test compares a sample mean to a known or hypothesized value. It answers: "Is the population mean different from this specific value?"

ComponentDescriptionFormula
Null hypothesis (H₀)Population mean equals targetμ = μ₀
Alternative (two-tailed)Mean differs from targetμ ≠ μ₀
Alternative (left-tailed)Mean is less than targetμ < μ₀
Alternative (right-tailed)Mean is greater than targetμ > μ₀
Degrees of freedomSample size minus 1df = n - 1

Use cases: Testing if average blood pressure differs from 120, if exam scores differ from passing grade, if machine output differs from specification.

Independent Two-Sample T-Test

The independent samples t-test compares means from two different, unrelated groups. It determines if the difference between group means is statistically significant.

VariantAssumptionWhen to Use
Student's t-testEqual variances (σ₁² = σ₂²)When variances are similar
Welch's t-testUnequal variances allowedDefault choice; more robust

Two-Sample T-Statistic (Pooled)

t = (x̄₁ - x̄₂) / (sp × √(1/n₁ + 1/n₂))

Where:

  • x̄₁, x̄₂= Sample means of groups 1 and 2
  • sp= Pooled standard deviation
  • n₁, n₂= Sample sizes of groups 1 and 2
  • sp²= [(n₁-1)s₁² + (n₂-1)s₂²] / (n₁+n₂-2)

Paired T-Test (Dependent Samples)

The paired t-test compares two related measurements, typically before and after a treatment or intervention on the same subjects. It tests whether the mean difference is significantly different from zero.

CharacteristicPaired T-TestIndependent T-Test
Data structureSame subjects, two conditionsDifferent subjects in each group
What's analyzedDifferences within subjectsDifference between group means
Statistical powerUsually higherUsually lower
ExampleBefore vs after treatmentTreatment vs control group

Paired T-Statistic

t = d̄ / (sd / √n)

Where:

  • = Mean of differences (after - before)
  • sd= Standard deviation of differences
  • n= Number of pairs
  • df= Degrees of freedom = n - 1

Assumptions of T-Tests

T-tests require certain conditions to be valid. The test is fairly robust to some violations, especially with larger samples.

AssumptionRequirementWhat If Violated?
Random samplingData from random sampleResults may not generalize
IndependenceObservations are independentUse paired t-test if dependent
NormalityData approximately normalRobust if n > 30 (CLT applies)
Equal variances*σ₁² ≈ σ₂² for Student's tUse Welch's t-test instead
Continuous dataInterval or ratio scaleUse chi-square for categorical

Testing normality: Use Shapiro-Wilk test, Q-Q plots, or histograms. Minor deviations are acceptable, especially with n > 30.

Testing equal variances: Use Levene's test. When in doubt, use Welch's t-test (it doesn't assume equal variances).

T-Distribution and Critical Values

The t-distribution is similar to the normal distribution but has heavier tails, accounting for uncertainty from estimating the standard deviation. As sample size increases, t approaches normal.

dfα = 0.10 (two-tail)α = 0.05 (two-tail)α = 0.01 (two-tail)
5±2.015±2.571±4.032
10±1.812±2.228±3.169
20±1.725±2.086±2.845
30±1.697±2.042±2.750
∞ (normal)±1.645±1.960±2.576

Decision rule: Reject H₀ if |t| > t_critical, or equivalently if p-value < α.

Effect Size: Cohen's d

Statistical significance doesn't indicate practical importance. Effect size measures the magnitude of difference, helping interpret whether a significant result is meaningful.

Cohen's dInterpretationPercentile Standing
d = 0.2Small effect58th percentile (vs 50th)
d = 0.5Medium effect69th percentile (vs 50th)
d = 0.8Large effect79th percentile (vs 50th)
d = 1.0Very large effect84th percentile (vs 50th)

Cohen's d (Two-Sample)

d = (x̄₁ - x̄₂) / sp

Where:

  • d= Effect size (Cohen's d)
  • x̄₁ - x̄₂= Difference between means
  • sp= Pooled standard deviation

Worked Examples

One-Sample T-Test

Problem:

A sample of 25 students scored an average of 78 on a test (s = 10). Is this significantly different from the national average of 75?

Solution Steps:

  1. 1State hypotheses: H₀: μ = 75; H₁: μ ≠ 75 (two-tailed)
  2. 2Calculate t-statistic: t = (78 - 75) / (10 / √25) = 3 / 2 = 1.50
  3. 3Find degrees of freedom: df = 25 - 1 = 24
  4. 4Find critical value: t(24, 0.05 two-tailed) = ±2.064
  5. 5Compare: |1.50| < 2.064, so fail to reject H₀

Result:

t(24) = 1.50, p > 0.05. The sample mean is not significantly different from 75 at α = 0.05. The 3-point difference could be due to sampling variability.

Independent Two-Sample T-Test

Problem:

Treatment group (n=20): mean=85, SD=12. Control group (n=20): mean=78, SD=10. Is the treatment effective?

Solution Steps:

  1. 1Calculate pooled variance: sp² = [(19×144) + (19×100)] / 38 = 4636/38 = 122; sp = 11.05
  2. 2Calculate t: t = (85 - 78) / (11.05 × √(1/20 + 1/20)) = 7 / (11.05 × 0.316) = 7 / 3.49 = 2.01
  3. 3Degrees of freedom: df = 20 + 20 - 2 = 38
  4. 4Critical value: t(38, 0.05 two-tailed) ≈ 2.024
  5. 5Compare: 2.01 < 2.024, so marginally fail to reject at α = 0.05

Result:

t(38) = 2.01, p ≈ 0.052. The result is marginally non-significant at α = 0.05. Cohen's d = 7/11.05 = 0.63 indicates a medium-to-large effect. More data might show significance.

Paired T-Test (Before-After)

Problem:

Weight loss program: 10 participants' weight before and after. Differences: -3, -5, -2, 0, -4, -3, -6, -1, -2, -4 (negative = weight loss). Is the program effective?

Solution Steps:

  1. 1Calculate mean difference: d̄ = (-3-5-2+0-4-3-6-1-2-4)/10 = -30/10 = -3.0 kg
  2. 2Calculate SD of differences: sd = 1.83
  3. 3Calculate t: t = -3.0 / (1.83 / √10) = -3.0 / 0.579 = -5.18
  4. 4Degrees of freedom: df = 10 - 1 = 9
  5. 5Critical value (one-tailed, α = 0.05): t(9) = -1.833

Result:

t(9) = -5.18, p < 0.001. The mean weight loss of 3 kg is highly significant. The program is effective. Effect size d = 3.0/1.83 = 1.64 (very large).

Tips & Best Practices

  • Always report effect size (Cohen's d) alongside p-values—statistical significance doesn't equal practical significance.
  • Use Welch's t-test as default for two-sample comparisons; it's more robust than Student's t-test.
  • Check assumptions: normality (Shapiro-Wilk test, Q-Q plot) and equal variances (Levene's test).
  • For non-normal data with small samples, consider Mann-Whitney U (independent) or Wilcoxon (paired) tests.
  • Paired t-tests have more power than independent t-tests—use paired designs when possible.
  • Report results completely: t(df) = value, p = value, d = value, along with means and SDs.
  • The 95% confidence interval for mean difference provides the same information as p < 0.05.

Frequently Asked Questions

Use a t-test when: (1) the population standard deviation (σ) is unknown (which is almost always true), or (2) sample size is small (n < 30). Use a z-test only when σ is known and n is large. In practice, t-tests are used almost exclusively because we rarely know the true population SD. With large samples (n > 30), t and z give nearly identical results.
Two-tailed tests check if the mean differs in either direction (μ ≠ μ₀). One-tailed tests check for difference in a specific direction (μ > μ₀ or μ < μ₀). Use two-tailed unless you have strong theoretical reason to expect a specific direction BEFORE seeing the data. One-tailed tests have more power but only in the predicted direction.
The t-test is fairly robust to non-normality, especially with larger samples (n > 30) due to the Central Limit Theorem. For small samples with severe skewness or outliers, consider: (1) data transformation (log, square root), (2) non-parametric alternatives (Mann-Whitney U or Wilcoxon signed-rank test), or (3) bootstrapping methods.
Degrees of freedom (df) represent the number of independent pieces of information used to estimate a parameter. For one-sample: df = n - 1 (lose 1 df estimating the mean). For independent two-sample: df = n₁ + n₂ - 2 (lose 2 df for two means). More df means the t-distribution is closer to normal and critical values are smaller.
Welch's t-test is generally recommended as the default because: (1) it doesn't assume equal variances, (2) it performs well even when variances ARE equal, (3) it's more robust to unequal sample sizes. Student's t-test is only appropriate when you're confident variances are equal (test with Levene's test) AND sample sizes are similar.
Statistical significance (p < 0.05) means the result is unlikely due to chance, but it doesn't indicate practical importance. With large samples, even tiny differences become 'significant.' Effect size (Cohen's d) measures the magnitude of difference. A significant result with d < 0.2 may not be practically meaningful. Always report both p-value AND effect size.

Sources & References

Last updated: 2026-01-22