How to calculate a minimum value for which 95% of future sampled values will be higher?

Question

Greetings,

I have several sets of data for the tested strength of a material. The maximum number of data points in each data set is 5, so I really don't have a lot to work from. Also, I'm not as handy with statistics as I would like to be, so I'm throwing this to you guys. I need to take this data into a function and output a single value that represents the predicted safe lower limit of the data set for which 95% of all future tested samples will be higher. I'm basing this function on the Kolmogorov–Smirnov statistic and Student's T test.

The intention is to determine the safest strength of material based on tested data, but not be too conservative.

I wrote this to do what I intend, however I'm not 100% sure if what I'm doing is based on sound statistical thinking. Below is a screenshot of the function and attached is the worksheet. Please, play with it and inform me of your thoughts.

LucMeekes · Accepted Answer

- You set n to last(x), that makes n dependent on the actual value of ORIGIN. If ORIGIN=0, this sets n to the number of elements in x minus 1, if ORIGIN=1 it makes n be the number of elements in x, But ORIGIN can be set to an arbitrary value...I would set n to length(x), or length(x)-1 if that is what you need. Note that this ripples through because...

- You set df to be n-1, and use that to index into the t95 vector, using df-1, so you're taking the 2-but last element, is that intentional?

- Your internal function D(x,v) tries to find the number of elements in x whose value is below v. Have you looked at the built-in function match()? The expression:

Gives you the index of the first element in x that is above v. With x sorted (as you do now), this:

gives you the number of elements in x that are below v. Or an error if none of the values in x are above v, but I guess that your v will always be in range. And besides, you don't need this at al because...

- You use the function D() on a generated array generated by the rnorm random generator with given mean and stdev. However, you don't need to generate the random numbers. From given mean and stdev you can easily calculate the percentage of numbers that lie below a given value x with:

Hope this helps.

Success!
Luc

Sign up

Please use your PTC eSupport account.

Welcome to the PTC Community

Please use your PTC eSupport account.