unique(): Testing a vector for duplicates

Question

I have a numeric vector x that I'll need to rank. Before ranking however, I'd like to check whether it has any elements that occur more than once.

I suppose one way that I could do that is to check whether

rows(x) == rows(unique(x));

That seems like a lot of work, collecting all the unique entries only to count them and then discard them.

Is there a quicker or more elegant way?

1 Answer

Your Answer

aptech · Answer 1

I believe it will be quite a bit faster to first sort the data and then do a vectorized comparison of each element x_i with the next element x_i+1.

sort_x = sortc(x, 1);
rep = sort_x[1:rows(sort_x)-1] .== sort_x[2:rows(sort_x)];

if sumc(rep);
   //we have repetitions
else;
   //all data unique
endif;

Some quick and limited testing shows this to be about 40% faster.

link

aptech

1,878

aptech · Answer 2

I believe it will be quite a bit faster to first sort the data and then do a vectorized comparison of each element x_i with the next element x_i+1.

sort_x = sortc(x, 1);
rep = sort_x[1:rows(sort_x)-1] .== sort_x[2:rows(sort_x)];

if sumc(rep);
   //we have repetitions
else;
   //all data unique
endif;

Some quick and limited testing shows this to be about 40% faster.

link

aptech

1,878

unique(): Testing a vector for duplicates

1 Answer

Your Answer

1 Answer

You must login to post answers.

Have a Specific Question?

Need Support?