Filtering data set with random integers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Filtering data set with random integers
Hello fellow MCers,
I have a problem with a data set filter. My aim is to have a function that creates out of a given set of data points another set that contains 2 to the power of k elements being randomly selected; with k an integer.
So I created these functions in MC15. The annoying fact is that it works fine when I have a data set whose number of elements is close to 2k; e.g. 525.However, it fails to converge to a solution if the number of elements in the initial set is close to the next 2k.
E.g. data set containing 555 elements. It puts out a 512 element vector - fine.
Change number of data elements to say, 999, it has troubles finding a solution. Whereas
the next 2k element size of 1033 yields 1024.
My suspicion is that it probably has to to do with the way I'm generating a set of random integers (for the trim function) by invoking the runif function and truncating the real numbers to an integer.
Thanks in advance
Raiko
Solved! Go to Solution.
- Labels:
-
Statistics_Analysis
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Raiko Milanovic wrote:
Hello Stuart,
here is a pdf of my worksheet.. By and large I did it the way you proposed
Raiko
Thanks, Raiko.
A few observations.
Isy will be quicker if you use "return 1" rather "q <-q+1" and continuing checking after you've found a pair of equal indices.
It will (on average) be quicker to create a vector of valid indices and then to use the "augment random" method I outlined previously. This will guarantee there will be no duplicate indices, hence doing away with the need for Isy.
I think you could directly calculate k by floor(log (rows (X),2)).
Stuart
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Unfortunately, I'm Mathcadless at the moment, so can't see what you've done - apologies if you already know this method. One of the easiest ways to pick random elements from a vector is to use runif to create a vector of the same size as the original vector, augment the two vectors, sort on t he "runif" column and then extract the (now randomly sorted) "original" column. Then use submatrix on that to get as many elements as you need. Something like:
v:=[1,2 ....]
tmp:=augment (v,runif(rows(v),0,1))
tmp:=csort(tmp,1)
r:=tmp<0>
r:=submatrix(r,0,2^k, 0,0) ... or submatrix (r,ORIGIN,2^k, ORIGIN,ORIGIN)
Stuart
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello Stuart,
here is a pdf of my worksheet.. By and large I did it the way you proposed
Raiko
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Raiko Milanovic wrote:
Hello Stuart,
here is a pdf of my worksheet.. By and large I did it the way you proposed
Raiko
Thanks, Raiko.
A few observations.
Isy will be quicker if you use "return 1" rather "q <-q+1" and continuing checking after you've found a pair of equal indices.
It will (on average) be quicker to create a vector of valid indices and then to use the "augment random" method I outlined previously. This will guarantee there will be no duplicate indices, hence doing away with the need for Isy.
I think you could directly calculate k by floor(log (rows (X),2)).
Stuart
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you Stuart, it worked
Raiko
