Correlation Calculator
Calculate Pearson correlation coefficient, covariance, and R-squared between two variables.
Enter Your Data
Summary Statistics
Pearson Correlation (r)
Correlation Analysis
Correlation Strength Scale
Linear Regression
Significance Test
Data Pairs
| # | X | Y |
|---|---|---|
| 1 | 1 | 2.3 |
| 2 | 2 | 4.1 |
| 3 | 3 | 5.8 |
| 4 | 4 | 8.2 |
| 5 | 5 | 10.1 |
| 6 | 6 | 11.9 |
| 7 | 7 | 14.2 |
| 8 | 8 | 16 |
| 9 | 9 | 18.1 |
| 10 | 10 | 20 |
What Is Correlation?
Correlation measures the strength and direction of the linear relationship between two variables. It's one of the most important concepts in statistics, used everywhere from scientific research to finance to machine learning. The most common measure is the Pearson correlation coefficient (r), which ranges from -1 to +1.
| Correlation (r) | Strength | Interpretation | Example |
|---|---|---|---|
| r = +1 | Perfect positive | As X increases, Y increases proportionally | Celsius and Fahrenheit |
| 0.7 ≤ r < 1 | Strong positive | Clear upward trend | Height and weight |
| 0.3 ≤ r < 0.7 | Moderate positive | Noticeable upward trend | Study time and grades |
| 0 < r < 0.3 | Weak positive | Slight upward tendency | Shoe size and IQ |
| r = 0 | No correlation | No linear relationship | Coin flips and die rolls |
| -0.3 < r < 0 | Weak negative | Slight downward tendency | Various |
| -0.7 < r ≤ -0.3 | Moderate negative | Noticeable downward trend | Absences and grades |
| -1 ≤ r < -0.7 | Strong negative | Clear downward trend | Price and demand |
| r = -1 | Perfect negative | As X increases, Y decreases proportionally | Distance and fuel remaining |
Pearson Correlation Coefficient
Where:
- r= Pearson correlation coefficient (-1 to +1)
- xᵢ, yᵢ= Individual data points
- x̄, ȳ= Means of X and Y
Correlation Does Not Imply Causation
The most important rule in statistics: correlation does not imply causation. Just because two variables are correlated doesn't mean one causes the other. There could be a third variable (confounder) causing both, or the correlation could be spurious (coincidence).
| Correlation Observed | Possible Explanation | Why Not Causation |
|---|---|---|
| Ice cream sales ↔ Drownings | Both increase in summer (temperature confound) | Ice cream doesn't cause drowning |
| Shoe size ↔ Reading ability | Both increase with age (age confound) | Bigger feet don't cause better reading |
| Pirates ↔ Global temperature | Coincidence (spurious) | Fewer pirates didn't cause warming |
| Smoking ↔ Lung cancer | Actual causation (established experimentally) | This one IS causal (proven) |
To establish causation: Use controlled experiments, randomized trials, or advanced causal inference methods—not just observational correlation.
R-Squared: Coefficient of Determination
R-squared (r²) is the square of the correlation coefficient. It represents the proportion of variance in one variable that's explained by the other. R² is easier to interpret as a percentage and is crucial in regression analysis.
| Correlation (r) | R-Squared (r²) | Interpretation |
|---|---|---|
| r = 0.9 | r² = 0.81 = 81% | 81% of Y's variance explained by X |
| r = 0.7 | r² = 0.49 = 49% | 49% of Y's variance explained by X |
| r = 0.5 | r² = 0.25 = 25% | 25% of Y's variance explained by X |
| r = 0.3 | r² = 0.09 = 9% | Only 9% of Y's variance explained |
| r = 0.1 | r² = 0.01 = 1% | Virtually no explanatory power |
Key insight: A "moderate" correlation of r = 0.5 only explains 25% of the variance. Even a "strong" r = 0.7 leaves 51% unexplained. This shows why multiple factors usually matter.
Types of Correlation Coefficients
Different situations call for different correlation measures. Pearson assumes linear relationships and continuous data; alternatives exist for other scenarios.
| Type | Use When | Range | Assumptions |
|---|---|---|---|
| Pearson (r) | Linear relationship, continuous data | -1 to +1 | Normality, linearity |
| Spearman (ρ) | Monotonic relationship, ordinal data | -1 to +1 | None (rank-based) |
| Kendall (τ) | Ordinal data, small samples | -1 to +1 | None (rank-based) |
| Point-Biserial | One binary, one continuous variable | -1 to +1 | Normality of continuous |
| Phi (φ) | Two binary variables | -1 to +1 | 2×2 table |
When to use Spearman: For non-linear monotonic relationships, ordinal data (rankings), or when outliers are present. It's based on ranks, making it robust to extreme values.
Testing Correlation Significance
A correlation might appear strong but be due to chance, especially with small samples. Statistical testing determines if a correlation is significantly different from zero.
| Sample Size (n) | Critical r (α = 0.05) | Interpretation |
|---|---|---|
| n = 10 | r = ±0.632 | Need strong correlation for significance |
| n = 20 | r = ±0.444 | Moderate correlation can be significant |
| n = 30 | r = ±0.361 | Smaller r can be significant |
| n = 50 | r = ±0.279 | Weak-moderate correlation significant |
| n = 100 | r = ±0.197 | Even weak correlations significant |
T-Test for Correlation Significance
Where:
- t= Test statistic
- r= Correlation coefficient
- n= Sample size
- df= Degrees of freedom
Limitations of Correlation
Correlation is powerful but has important limitations. Understanding these prevents misinterpretation of data.
| Limitation | Description | Solution |
|---|---|---|
| Only detects linear relationships | Can miss curved relationships | Plot data, use Spearman for monotonic |
| Sensitive to outliers | One extreme point can dominate r | Use Spearman, remove outliers |
| Doesn't imply causation | Third variables may explain relationship | Use controlled experiments |
| Range restriction | Limited range underestimates true r | Use full range of data |
| Sample size matters | Small samples can show spurious r | Test significance, get larger n |
Always visualize: A scatter plot reveals patterns that r alone cannot. Anscombe's quartet shows four datasets with identical r ≈ 0.82 but completely different relationships.
Applications of Correlation
Correlation analysis is used across virtually every field that deals with data. Understanding applications helps contextualize what "strong" or "weak" correlations mean.
| Field | Application | Typical r Values |
|---|---|---|
| Psychology | Personality traits, test reliability | r = 0.3-0.5 often meaningful |
| Medicine | Risk factors, treatment outcomes | r = 0.2-0.4 can be clinically important |
| Finance | Stock correlations, portfolio diversification | r < 0.3 for diversification |
| Education | Predictors of academic success | r = 0.3-0.6 for standardized tests |
| Physics | Experimental relationships | r > 0.99 expected for physical laws |
| Social Science | Survey variables | r = 0.2-0.5 common |
Worked Examples
Calculating Pearson Correlation
Problem:
Calculate the correlation between study hours (X) and exam scores (Y) for 5 students: X = [2,3,5,7,8], Y = [65,70,75,85,90]
Solution Steps:
- 1Calculate means: x̄ = 5, ȳ = 77
- 2Calculate deviations: x-x̄ = [-3,-2,0,2,3], y-ȳ = [-12,-7,-2,8,13]
- 3Calculate products: Σ(x-x̄)(y-ȳ) = 36+14+0+16+39 = 105
- 4Calculate sum of squares: Σ(x-x̄)² = 26, Σ(y-ȳ)² = 426
- 5Apply formula: r = 105 / √(26 × 426) = 105 / 105.24 = 0.998
Result:
r = 0.998, which is a nearly perfect positive correlation. More study hours strongly predict higher scores (though remember: correlation ≠ causation).
Interpreting R-Squared
Problem:
Height and weight have a correlation of r = 0.70 in a population. What percentage of weight variation is explained by height?
Solution Steps:
- 1Calculate r-squared: r² = 0.70² = 0.49
- 2Convert to percentage: 49%
- 3Interpret: Height explains 49% of weight variation
- 4Remaining: 51% is due to other factors
Result:
R² = 49%. Height explains about half the variation in weight. Other factors (diet, exercise, genetics, age) account for the other half.
Testing Correlation Significance
Problem:
A study of 25 people finds r = 0.40 between exercise and happiness. Is this significant at α = 0.05?
Solution Steps:
- 1Calculate t-statistic: t = 0.40 × √[(25-2)/(1-0.16)] = 0.40 × √(23/0.84) = 0.40 × 5.23 = 2.09
- 2Degrees of freedom: df = 25 - 2 = 23
- 3Critical t (α=0.05, two-tailed, df=23): t* = 2.069
- 4Compare: 2.09 > 2.069
Result:
t = 2.09 > critical value 2.069, so the correlation is statistically significant at the 0.05 level. The relationship is unlikely to be due to chance alone.
Tips & Best Practices
- ✓Always create a scatter plot before calculating correlation—it reveals patterns r cannot show.
- ✓Remember: correlation ≠ causation. Strong correlation may be due to confounding variables.
- ✓Use Spearman's correlation for ordinal data, non-linear monotonic relationships, or when outliers exist.
- ✓r² (R-squared) tells you the proportion of variance explained—often more interpretable than r.
- ✓Larger samples give more reliable correlation estimates; small samples can show spurious correlations.
- ✓Test statistical significance—a correlation of r = 0.3 may or may not be 'real' depending on sample size.
- ✓Context matters: r = 0.3 is 'weak' in physics but may be important in psychology or medicine.
Frequently Asked Questions
Sources & References
Last updated: 2026-01-22