Correlation Calculator

Calculate Pearson correlation coefficient, covariance, and R-squared between two variables.

Enter Your Data

Data Points:10 pairs

Summary Statistics

X Variable
Mean:5.5000
Std Dev:3.0277
Y Variable
Mean:11.0700
Std Dev:6.0222

Pearson Correlation (r)

0.9997
Very strong positive correlation

Correlation Analysis

Correlation (r)0.999709
R-Squared (r2)0.999418
99.9% of Y variance explained by X
Covariance18.227778

Correlation Strength Scale

0.9 - 1.0Very strong
0.7 - 0.9Strong
0.5 - 0.7Moderate
0.3 - 0.5Weak
0.0 - 0.3Very weak

Linear Regression

y = 1.9885x + 0.1333
Slope (m)
1.988485
Intercept (b)
0.133333
Standard Error0.154135

Significance Test

t-statistic117.1786
Degrees of Freedom8

Data Pairs

#XY
112.3
224.1
335.8
448.2
5510.1
6611.9
7714.2
8816
9918.1
101020

What Is Correlation?

Correlation measures the strength and direction of the linear relationship between two variables. It's one of the most important concepts in statistics, used everywhere from scientific research to finance to machine learning. The most common measure is the Pearson correlation coefficient (r), which ranges from -1 to +1.

Correlation (r)StrengthInterpretationExample
r = +1Perfect positiveAs X increases, Y increases proportionallyCelsius and Fahrenheit
0.7 ≤ r < 1Strong positiveClear upward trendHeight and weight
0.3 ≤ r < 0.7Moderate positiveNoticeable upward trendStudy time and grades
0 < r < 0.3Weak positiveSlight upward tendencyShoe size and IQ
r = 0No correlationNo linear relationshipCoin flips and die rolls
-0.3 < r < 0Weak negativeSlight downward tendencyVarious
-0.7 < r ≤ -0.3Moderate negativeNoticeable downward trendAbsences and grades
-1 ≤ r < -0.7Strong negativeClear downward trendPrice and demand
r = -1Perfect negativeAs X increases, Y decreases proportionallyDistance and fuel remaining

Pearson Correlation Coefficient

r = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / √[Σ(xᵢ - x̄)² × Σ(yᵢ - ȳ)²]

Where:

  • r= Pearson correlation coefficient (-1 to +1)
  • xᵢ, yᵢ= Individual data points
  • x̄, ȳ= Means of X and Y

Correlation Does Not Imply Causation

The most important rule in statistics: correlation does not imply causation. Just because two variables are correlated doesn't mean one causes the other. There could be a third variable (confounder) causing both, or the correlation could be spurious (coincidence).

Correlation ObservedPossible ExplanationWhy Not Causation
Ice cream sales ↔ DrowningsBoth increase in summer (temperature confound)Ice cream doesn't cause drowning
Shoe size ↔ Reading abilityBoth increase with age (age confound)Bigger feet don't cause better reading
Pirates ↔ Global temperatureCoincidence (spurious)Fewer pirates didn't cause warming
Smoking ↔ Lung cancerActual causation (established experimentally)This one IS causal (proven)

To establish causation: Use controlled experiments, randomized trials, or advanced causal inference methods—not just observational correlation.

R-Squared: Coefficient of Determination

R-squared (r²) is the square of the correlation coefficient. It represents the proportion of variance in one variable that's explained by the other. R² is easier to interpret as a percentage and is crucial in regression analysis.

Correlation (r)R-Squared (r²)Interpretation
r = 0.9r² = 0.81 = 81%81% of Y's variance explained by X
r = 0.7r² = 0.49 = 49%49% of Y's variance explained by X
r = 0.5r² = 0.25 = 25%25% of Y's variance explained by X
r = 0.3r² = 0.09 = 9%Only 9% of Y's variance explained
r = 0.1r² = 0.01 = 1%Virtually no explanatory power

Key insight: A "moderate" correlation of r = 0.5 only explains 25% of the variance. Even a "strong" r = 0.7 leaves 51% unexplained. This shows why multiple factors usually matter.

Types of Correlation Coefficients

Different situations call for different correlation measures. Pearson assumes linear relationships and continuous data; alternatives exist for other scenarios.

TypeUse WhenRangeAssumptions
Pearson (r)Linear relationship, continuous data-1 to +1Normality, linearity
Spearman (ρ)Monotonic relationship, ordinal data-1 to +1None (rank-based)
Kendall (τ)Ordinal data, small samples-1 to +1None (rank-based)
Point-BiserialOne binary, one continuous variable-1 to +1Normality of continuous
Phi (φ)Two binary variables-1 to +12×2 table

When to use Spearman: For non-linear monotonic relationships, ordinal data (rankings), or when outliers are present. It's based on ranks, making it robust to extreme values.

Testing Correlation Significance

A correlation might appear strong but be due to chance, especially with small samples. Statistical testing determines if a correlation is significantly different from zero.

Sample Size (n)Critical r (α = 0.05)Interpretation
n = 10r = ±0.632Need strong correlation for significance
n = 20r = ±0.444Moderate correlation can be significant
n = 30r = ±0.361Smaller r can be significant
n = 50r = ±0.279Weak-moderate correlation significant
n = 100r = ±0.197Even weak correlations significant

T-Test for Correlation Significance

t = r × √[(n-2) / (1-r²)] with df = n-2

Where:

  • t= Test statistic
  • r= Correlation coefficient
  • n= Sample size
  • df= Degrees of freedom

Limitations of Correlation

Correlation is powerful but has important limitations. Understanding these prevents misinterpretation of data.

LimitationDescriptionSolution
Only detects linear relationshipsCan miss curved relationshipsPlot data, use Spearman for monotonic
Sensitive to outliersOne extreme point can dominate rUse Spearman, remove outliers
Doesn't imply causationThird variables may explain relationshipUse controlled experiments
Range restrictionLimited range underestimates true rUse full range of data
Sample size mattersSmall samples can show spurious rTest significance, get larger n

Always visualize: A scatter plot reveals patterns that r alone cannot. Anscombe's quartet shows four datasets with identical r ≈ 0.82 but completely different relationships.

Applications of Correlation

Correlation analysis is used across virtually every field that deals with data. Understanding applications helps contextualize what "strong" or "weak" correlations mean.

FieldApplicationTypical r Values
PsychologyPersonality traits, test reliabilityr = 0.3-0.5 often meaningful
MedicineRisk factors, treatment outcomesr = 0.2-0.4 can be clinically important
FinanceStock correlations, portfolio diversificationr < 0.3 for diversification
EducationPredictors of academic successr = 0.3-0.6 for standardized tests
PhysicsExperimental relationshipsr > 0.99 expected for physical laws
Social ScienceSurvey variablesr = 0.2-0.5 common

Worked Examples

Calculating Pearson Correlation

Problem:

Calculate the correlation between study hours (X) and exam scores (Y) for 5 students: X = [2,3,5,7,8], Y = [65,70,75,85,90]

Solution Steps:

  1. 1Calculate means: x̄ = 5, ȳ = 77
  2. 2Calculate deviations: x-x̄ = [-3,-2,0,2,3], y-ȳ = [-12,-7,-2,8,13]
  3. 3Calculate products: Σ(x-x̄)(y-ȳ) = 36+14+0+16+39 = 105
  4. 4Calculate sum of squares: Σ(x-x̄)² = 26, Σ(y-ȳ)² = 426
  5. 5Apply formula: r = 105 / √(26 × 426) = 105 / 105.24 = 0.998

Result:

r = 0.998, which is a nearly perfect positive correlation. More study hours strongly predict higher scores (though remember: correlation ≠ causation).

Interpreting R-Squared

Problem:

Height and weight have a correlation of r = 0.70 in a population. What percentage of weight variation is explained by height?

Solution Steps:

  1. 1Calculate r-squared: r² = 0.70² = 0.49
  2. 2Convert to percentage: 49%
  3. 3Interpret: Height explains 49% of weight variation
  4. 4Remaining: 51% is due to other factors

Result:

R² = 49%. Height explains about half the variation in weight. Other factors (diet, exercise, genetics, age) account for the other half.

Testing Correlation Significance

Problem:

A study of 25 people finds r = 0.40 between exercise and happiness. Is this significant at α = 0.05?

Solution Steps:

  1. 1Calculate t-statistic: t = 0.40 × √[(25-2)/(1-0.16)] = 0.40 × √(23/0.84) = 0.40 × 5.23 = 2.09
  2. 2Degrees of freedom: df = 25 - 2 = 23
  3. 3Critical t (α=0.05, two-tailed, df=23): t* = 2.069
  4. 4Compare: 2.09 > 2.069

Result:

t = 2.09 > critical value 2.069, so the correlation is statistically significant at the 0.05 level. The relationship is unlikely to be due to chance alone.

Tips & Best Practices

  • Always create a scatter plot before calculating correlation—it reveals patterns r cannot show.
  • Remember: correlation ≠ causation. Strong correlation may be due to confounding variables.
  • Use Spearman's correlation for ordinal data, non-linear monotonic relationships, or when outliers exist.
  • r² (R-squared) tells you the proportion of variance explained—often more interpretable than r.
  • Larger samples give more reliable correlation estimates; small samples can show spurious correlations.
  • Test statistical significance—a correlation of r = 0.3 may or may not be 'real' depending on sample size.
  • Context matters: r = 0.3 is 'weak' in physics but may be important in psychology or medicine.

Frequently Asked Questions

Correlation only shows that two variables move together—it can't distinguish between: (1) X causes Y, (2) Y causes X, (3) a third variable causes both (confounding), or (4) coincidence. To prove causation, you need controlled experiments where you manipulate one variable and observe the effect on another while holding other factors constant.
It depends on the field. In physics, r < 0.99 might be weak. In social sciences, r > 0.50 is often considered strong. Generally: |r| < 0.3 is weak, 0.3-0.7 is moderate, > 0.7 is strong. But context matters—a 'weak' r = 0.2 correlation between a drug and survival could be clinically important.
No, Pearson's r is mathematically bounded between -1 and +1. If you calculate a value outside this range, there's an error. Values of exactly ±1 indicate perfect linear relationships (all points on a line). Values near 0 indicate no linear relationship.
Use Spearman when: (1) the relationship is monotonic but not linear; (2) data is ordinal (rankings); (3) there are significant outliers; (4) data isn't normally distributed. Spearman uses ranks rather than raw values, making it robust to non-linearity and outliers.
Correlation measures the strength of relationship between two variables (symmetric—it doesn't matter which is X or Y). Regression predicts one variable from another (asymmetric—you predict Y from X). Correlation gives r; regression gives a prediction equation. Both use r² to measure explained variance.
Not necessarily! r = 0 means no LINEAR relationship. Variables can have strong non-linear relationships (curves, U-shapes) with r = 0. Always plot your data. Anscombe's quartet famously shows datasets with identical correlations but completely different patterns.

Sources & References

Last updated: 2026-01-22