Hi,
Can anyone help me with this:
I have a correlation matrix NxN and I need to find all the index vectors of dimension greater or equal to 3, where all pairs included in each index vector are jointly correlated (in particular they obtain correlation(a,b)>c0 ).
for example assume we have the following table of correlation pairs:
variable index variable index
4.0000000 5.0000000 (4 and 5 with correlation > c0)
4.0000000 6.0000000 (4 and 6 with orrelation > c0)
4.0000000 9.0000000 (4 and 9 with correlation > c0)
5.0000000 9.0000000 (5 and 9 with correlation > c0)
5.0000000 10.000000 (5 and 10 with correlation > c0)
6.0000000 9.0000000 (6 and 9 with correlation > c0)
assume index vectors with column size greater or equal to 3.
Then the index vectors are
index_vec1 = 4|6|9 and
index_vec2 = 4|5|9.
Notice that 5,6 do not obtain correlation>c0, so they are not "correlated".
Can anyone help?
Thanks,
T.
2 Answers
1
accepted
Here is a start. I'll see if I can find some time later to make it loop over the vector to find all of the index vectors rather than just one of them.
c = { 4 5,
4 6,
4 9,
5 9,
5 10,
6 9 };
//Sort by first column, then secondarily by second column
c = sortmc(c, 1|2);
//Grab first variable
var_1 = c[1,1];
//Grab first correlating variable
var_2 = c[1,2];
//Select rows of 'var_1' except for
//first row which references 'var_2'
c_1 = selif(c, 0|(c[2:rows(c),1] .== var_1));
//Remove observations of 'var_1'
c = delif(c, c[.,1] .== var_1);
//Create vector of 'var_2's correlations
c_2 = selif(c, (c[.,1] .== var_2));
//Find variables with which 'var_1'
//and 'var_2' correlate
idx_1 = selif(c_1[.,2], sumr(c_1[.,2] .== c_2[.,2]'));
//Add 'var_1' and 'var_2' to the list
//of shared correlations
idx_1 = var_1 | var_2 | idx_1;
print "idx_1 = " idx_1;
0
I think that, this is similar to the maximal clique problem
http://en.wikipedia.org/wiki/Clique_problem
Is there a Gauss Code for this?
Thanks
T.
Your Answer
2 Answers
Here is a start. I'll see if I can find some time later to make it loop over the vector to find all of the index vectors rather than just one of them.
c = { 4 5,
4 6,
4 9,
5 9,
5 10,
6 9 };
//Sort by first column, then secondarily by second column
c = sortmc(c, 1|2);
//Grab first variable
var_1 = c[1,1];
//Grab first correlating variable
var_2 = c[1,2];
//Select rows of 'var_1' except for
//first row which references 'var_2'
c_1 = selif(c, 0|(c[2:rows(c),1] .== var_1));
//Remove observations of 'var_1'
c = delif(c, c[.,1] .== var_1);
//Create vector of 'var_2's correlations
c_2 = selif(c, (c[.,1] .== var_2));
//Find variables with which 'var_1'
//and 'var_2' correlate
idx_1 = selif(c_1[.,2], sumr(c_1[.,2] .== c_2[.,2]'));
//Add 'var_1' and 'var_2' to the list
//of shared correlations
idx_1 = var_1 | var_2 | idx_1;
print "idx_1 = " idx_1;
I think that, this is similar to the maximal clique problem
http://en.wikipedia.org/wiki/Clique_problem
Is there a Gauss Code for this?
Thanks
T.
