Solved: Filtering data set with random integers

Raiko · ‎Oct 21, 2016

Hello fellow MCers,

I have a problem with a data set filter. My aim is to have a function that creates out of a given set of data points another set that contains 2 to the power of k elements being randomly selected; with k an integer.

So I created these functions in MC15. The annoying fact is that it works fine when I have a data set whose number of elements is close to 2k; e.g. 525.However, it fails to converge to a solution if the number of elements in the initial set is close to the next 2k.

E.g. data set containing 555 elements. It puts out a 512 element vector - fine.

Change number of data elements to say, 999, it has troubles finding a solution. Whereas

the next 2k element size of 1033 yields 1024.

My suspicion is that it probably has to to do with the way I'm generating a set of random integers (for the trim function) by invoking the runif function and truncating the real numbers to an integer.

Thanks in advance

Raiko

StuartBruff · ‎Oct 21, 2016

Raiko Milanovic wrote:

Hello Stuart,

here is a pdf of my worksheet.. By and large I did it the way you proposed

Raiko

Thanks, Raiko.

A few observations.

Isy will be quicker if you use "return 1" rather "q <-q+1" and continuing checking after you've found a pair of equal indices.

It will (on average) be quicker to create a vector of valid indices and then to use the "augment random" method I outlined previously. This will guarantee there will be no duplicate indices, hence doing away with the need for Isy.

I think you could directly calculate k by floor(log (rows (X),2)).

Stuart

View solution in original post

StuartBruff · ‎Oct 21, 2016

Unfortunately, I'm Mathcadless at the moment, so can't see what you've done - apologies if you already know this method. One of the easiest ways to pick random elements from a vector is to use runif to create a vector of the same size as the original vector, augment the two vectors, sort on t he "runif" column and then extract the (now randomly sorted) "original" column. Then use submatrix on that to get as many elements as you need. Something like:

v:=[1,2 ....]

tmp:=augment (v,runif(rows(v),0,1))

tmp:=csort(tmp,1)

r:=tmp<0>

r:=submatrix(r,0,2^k, 0,0) ... or submatrix (r,ORIGIN,2^k, ORIGIN,ORIGIN)

Stuart

Raiko · ‎Oct 21, 2016

Hello Stuart,

here is a pdf of my worksheet.. By and large I did it the way you proposed

Raiko

StuartBruff · ‎Oct 21, 2016

Raiko Milanovic wrote:

Hello Stuart,

here is a pdf of my worksheet.. By and large I did it the way you proposed

Raiko

Thanks, Raiko.

A few observations.

Isy will be quicker if you use "return 1" rather "q <-q+1" and continuing checking after you've found a pair of equal indices.

It will (on average) be quicker to create a vector of valid indices and then to use the "augment random" method I outlined previously. This will guarantee there will be no duplicate indices, hence doing away with the need for Isy.

I think you could directly calculate k by floor(log (rows (X),2)).

Stuart

Raiko · ‎Oct 21, 2016

Thank you Stuart, it worked

Raiko