 # Covariance and correlation

## Example Data:

Income (X)        5 10 15 20 25 Education (Y) 10  8 10 15 12

## Step by step calculations

ρ = Cov(X, Y) / σXσY
Cov(X, Y) = E[(X - E(X))(Y - E(Y))]
1. Expectation of X
E(X) = (5 + 10 + 15 + 20 + 25) / 5 = 15
```//Create a column vector
X = { 5, 10, 15, 20, 25 };

//Calculate the mean of the column
e_of_X = meanc(X);
```
2. The expectation of (X minus the expectation of X), also known as the residuals
 E( X - E(X) ) = ( (5 - 15), (10 - 15), (15 - 15), (20 - 15), (25 - 15) ) = -10, -5, 0, -5, 5
```res_X = X - e_of_X;
```
3. The variance of X, also known as the sum of the squares
 Var(X) = ( (5 - 15)2 + (10 - 15)2 + (15 - 15)2 + (20 - 15)2 + (25 - 15)2 ) / 5 = (100 + 25 + 0 + 25 + 100) / 5 = 50
```ssq_X = sumc(res_X^2) / rows(X);
```
4. Calculate the standard deviation of X, σ
```sigma_X = sqrt(ssq_X);
```
5. Repeat steps 1-4 for Y
 E(Y) = (10 + 8 + 10 + 15 + 12) / 5 = 11 ResidualY = E( Y - E(Y) ) = (10 - 11), (8 - 11), (10 - 11), (15 - 11), (12 - 11) ) E( Y - E(Y) ) = -1, -3, -1, 4, -1 Var(Y) = ( (10 - 11)2 + (8 - 11)2 + (10 - 11)2 + (15 - 11)2 + (12 - 11)2 ) / 5 = (1 + 9 + 1 + 16 + 1) / 5 = 5.6
6. Calculate the covariance of X and Y
```//Calculate the mean of the linear combination of the
//residuals vectors divided by the number of observations
cov_XY = meanc(res_X'res_Y) / rows(X);
```
7. Calculate the correlation coefficient, ρ
```rho = cov_XY / (sigma_X * sigma_Y);
```
Here is the entire GAUSS program to calculate the correlation coefficient between our two example variables:
```//Enter the data as a pair of column vectors
X = { 5, 10, 15, 20, 25 };
Y = { 10,  8, 10, 15, 12 };

//Calculate the mean of each variable
e_of_X = meanc(X);
e_of_Y = meanc(Y);
print "The expectation of X = " e_of_X;
print "The expectation of Y = " e_of_Y;

//Calculate the residuals
res_X = X - e_of_X;
res_Y = Y - e_of_Y;

//Calculate the sum of the squares
ssq_X = sumc(res_X^2) / rows(X);
ssq_Y = sumc(res_Y^2) / rows(Y);
print "The variance of X = " ssq_X;
print "The variance of Y = " ssq_Y;

//Calculate the standard deviation
sigma_X = sqrt(ssq_X);
sigma_Y = sqrt(ssq_Y);

//Calculate the mean of the linear combination
//of the residuals vectors
cov_XY = meanc(res_X'res_Y) / rows(X);
print "The covariance of X and Y = " cov_XY;

//Calculate the correlation coefficient
rho = cov_XY / (sigma_X * sigma_Y);
print "The correlation coefficient between X and Y = " rho;
```

## Answers from the textbook for comparison

Variance of X = 50 Variance of Y = 5.6 Covariance of X,Y = 11 Correlation Coefficient = 0.66

### Have a Specific Question?

Get a real answer from a real person

### Need Support?

Get help from our friendly experts.