Covariance and correlation
Calculating the correlation coeficient, ρ
Example Data:
Income (X) 5 10 15 20 25
Education (Y) 10 8 10 15 12
Step by step calculations
ρ = Cov(X, Y) / σ_{X}σ_{Y}
Cov(X, Y) = E[(X  E(X))(Y  E(Y))]
Cov(X, Y) = E[(X  E(X))(Y  E(Y))]
 Expectation of X
E(X) = (5 + 10 + 15 + 20 + 25) / 5 = 15
//Create a column vector X = { 5, 10, 15, 20, 25 }; //Calculate the mean of the column e_of_X = meanc(X);

The expectation of (X minus the expectation of X), also known as the residuals
E( X  E(X) ) = ( (5  15), (10  15), (15  15), (20  15), (25  15) ) = 10, 5, 0, 5, 5 res_X = X  e_of_X;
 The variance of X, also known as the sum of the squares
Var(X) = ( (5  15)^{2} + (10  15)^{2} + (15  15)^{2} + (20  15)^{2} + (25  15)^{2} ) / 5 = (100 + 25 + 0 + 25 + 100) / 5 = 50 ssq_X = sumc(res_X^2) / rows(X);
 Calculate the standard deviation of X, σ
sigma_X = sqrt(ssq_X);
 Repeat steps 14 for Y
E(Y) = (10 + 8 + 10 + 15 + 12) / 5 = 11 Residual_{Y} = E( Y  E(Y) ) = (10  11), (8  11), (10  11), (15  11), (12  11) ) E( Y  E(Y) ) = 1, 3, 1, 4, 1 Var(Y) = ( (10  11)^{2} + (8  11)^{2} + (10  11)^{2} + (15  11)^{2} + (12  11)^{2} ) / 5 = (1 + 9 + 1 + 16 + 1) / 5 = 5.6  Calculate the covariance of X and Y
//Calculate the mean of the linear combination of the //residuals vectors divided by the number of observations cov_XY = meanc(res_X'res_Y) / rows(X);
 Calculate the correlation coefficient, ρ
rho = cov_XY / (sigma_X * sigma_Y);
Here is the entire GAUSS program to calculate the correlation coefficient between our two example variables:
//Enter the data as a pair of column vectors X = { 5, 10, 15, 20, 25 }; Y = { 10, 8, 10, 15, 12 }; //Calculate the mean of each variable e_of_X = meanc(X); e_of_Y = meanc(Y); print "The expectation of X = " e_of_X; print "The expectation of Y = " e_of_Y; //Calculate the residuals res_X = X  e_of_X; res_Y = Y  e_of_Y; //Calculate the sum of the squares ssq_X = sumc(res_X^2) / rows(X); ssq_Y = sumc(res_Y^2) / rows(Y); print "The variance of X = " ssq_X; print "The variance of Y = " ssq_Y; //Calculate the standard deviation sigma_X = sqrt(ssq_X); sigma_Y = sqrt(ssq_Y); //Calculate the mean of the linear combination //of the residuals vectors cov_XY = meanc(res_X'res_Y) / rows(X); print "The covariance of X and Y = " cov_XY; //Calculate the correlation coefficient rho = cov_XY / (sigma_X * sigma_Y); print "The correlation coefficient between X and Y = " rho;
Answers from the textbook for comparison
Variance of X = 50
Variance of Y = 5.6
Covariance of X,Y = 11
Correlation Coefficient = 0.66