Econometric models and econometric forecasts: Example 2.1

Covariance and correlation

Calculating the correlation coeficient, ρ

Example Data:

Income (X)        5 10 15 20 25
Education (Y) 10  8 10 15 12

Step by step calculations

ρ = Cov(X, Y) / σXσY
Cov(X, Y) = E[(X - E(X))(Y - E(Y))]
  1. Expectation of X

    E(X) = (5 + 10 + 15 + 20 + 25) / 5 = 15
    //Create a column vector
    X = { 5, 10, 15, 20, 25 };
    
    //Calculate the mean of the column
    e_of_X = meanc(X);
    
  2. The expectation of (X minus the expectation of X), also known as the residuals

    E( X – E(X) ) = ( (5 – 15), (10 – 15), (15 – 15), (20 – 15), (25 – 15) )
    = -10, -5, 0, -5, 5
    res_X = X - e_of_X;
    
  3. The variance of X, also known as the sum of the squares
    Var(X) = ( (5 – 15)2 + (10 – 15)2 + (15 – 15)2 + (20 – 15)2 + (25 – 15)2 ) / 5
    = (100 + 25 + 0 + 25 + 100) / 5
    = 50
    ssq_X = sumc(res_X^2) / rows(X);
    
  4. Calculate the standard deviation of X, σ
    sigma_X = sqrt(ssq_X);
    
  5. Repeat steps 1-4 for Y
    E(Y) = (10 + 8 + 10 + 15 + 12) / 5 = 11
    ResidualY = E( Y – E(Y) )
    = (10 – 11), (8 – 11), (10 – 11), (15 – 11), (12 – 11) )
    E( Y – E(Y) ) = -1, -3, -1, 4, -1
    Var(Y) = ( (10 – 11)2 + (8 – 11)2 + (10 – 11)2 + (15 – 11)2 + (12 – 11)2 ) / 5
    = (1 + 9 + 1 + 16 + 1) / 5
    = 5.6
  6. Calculate the covariance of X and Y
    //Calculate the mean of the linear combination of the 
    //residuals vectors divided by the number of observations
    cov_XY = meanc(res_X'res_Y) / rows(X);
    
  7. Calculate the correlation coefficient, ρ
    rho = cov_XY / (sigma_X * sigma_Y);
    

Here is the entire GAUSS program to calculate the correlation coefficient between our two example variables:

//Enter the data as a pair of column vectors
X = { 5, 10, 15, 20, 25 };
Y = { 10,  8, 10, 15, 12 };

//Calculate the mean of each variable
e_of_X = meanc(X);
e_of_Y = meanc(Y);
print "The expectation of X = " e_of_X;
print "The expectation of Y = " e_of_Y;

//Calculate the residuals
res_X = X - e_of_X;
res_Y = Y - e_of_Y;

//Calculate the sum of the squares
ssq_X = sumc(res_X^2) / rows(X);
ssq_Y = sumc(res_Y^2) / rows(Y);
print "The variance of X = " ssq_X;
print "The variance of Y = " ssq_Y;

//Calculate the standard deviation
sigma_X = sqrt(ssq_X);
sigma_Y = sqrt(ssq_Y);

//Calculate the mean of the linear combination
//of the residuals vectors
cov_XY = meanc(res_X'res_Y) / rows(X);
print "The covariance of X and Y = " cov_XY;

//Calculate the correlation coefficient
rho = cov_XY / (sigma_X * sigma_Y);
print "The correlation coefficient between X and Y = " rho;

Answers from the textbook for comparison

Variance of X = 50
Variance of Y = 5.6
Covariance of X,Y = 11
Correlation Coefficient = 0.66