Covariance and correlation
Calculating the correlation coeficient, ρ
Example Data:
Income (X) 5 10 15 20 25 Education (Y) 10 8 10 15 12Step by step calculations
ρ = Cov(X, Y) / σXσY
Cov(X, Y) = E[(X - E(X))(Y - E(Y))]
Cov(X, Y) = E[(X - E(X))(Y - E(Y))]
- Expectation of X
E(X) = (5 + 10 + 15 + 20 + 25) / 5 = 15
//Create a column vector X = { 5, 10, 15, 20, 25 }; //Calculate the mean of the column e_of_X = meanc(X);
-
The expectation of (X minus the expectation of X), also known as the residuals
E( X - E(X) ) = ( (5 - 15), (10 - 15), (15 - 15), (20 - 15), (25 - 15) ) = -10, -5, 0, -5, 5 res_X = X - e_of_X;
- The variance of X, also known as the sum of the squares
Var(X) = ( (5 - 15)2 + (10 - 15)2 + (15 - 15)2 + (20 - 15)2 + (25 - 15)2 ) / 5 = (100 + 25 + 0 + 25 + 100) / 5 = 50 ssq_X = sumc(res_X^2) / rows(X);
- Calculate the standard deviation of X, σ
sigma_X = sqrt(ssq_X);
- Repeat steps 1-4 for Y
E(Y) = (10 + 8 + 10 + 15 + 12) / 5 = 11 ResidualY = E( Y - E(Y) ) = (10 - 11), (8 - 11), (10 - 11), (15 - 11), (12 - 11) ) E( Y - E(Y) ) = -1, -3, -1, 4, -1 Var(Y) = ( (10 - 11)2 + (8 - 11)2 + (10 - 11)2 + (15 - 11)2 + (12 - 11)2 ) / 5 = (1 + 9 + 1 + 16 + 1) / 5 = 5.6 - Calculate the covariance of X and Y
//Calculate the mean of the linear combination of the //residuals vectors divided by the number of observations cov_XY = meanc(res_X'res_Y) / rows(X);
- Calculate the correlation coefficient, ρ
rho = cov_XY / (sigma_X * sigma_Y);
//Enter the data as a pair of column vectors X = { 5, 10, 15, 20, 25 }; Y = { 10, 8, 10, 15, 12 }; //Calculate the mean of each variable e_of_X = meanc(X); e_of_Y = meanc(Y); print "The expectation of X = " e_of_X; print "The expectation of Y = " e_of_Y; //Calculate the residuals res_X = X - e_of_X; res_Y = Y - e_of_Y; //Calculate the sum of the squares ssq_X = sumc(res_X^2) / rows(X); ssq_Y = sumc(res_Y^2) / rows(Y); print "The variance of X = " ssq_X; print "The variance of Y = " ssq_Y; //Calculate the standard deviation sigma_X = sqrt(ssq_X); sigma_Y = sqrt(ssq_Y); //Calculate the mean of the linear combination //of the residuals vectors cov_XY = meanc(res_X'res_Y) / rows(X); print "The covariance of X and Y = " cov_XY; //Calculate the correlation coefficient rho = cov_XY / (sigma_X * sigma_Y); print "The correlation coefficient between X and Y = " rho;