cancel
Showing results for
Did you mean:
cancel
Showing results for
Did you mean:

Highlighted
Newbie

## Fitting Statistical Distributions

Since the subject of fitting a a statistical distribution to a data sample has come up a couple of times recently, I thought I'd post a worksheet that does such fits.

It currently has four distributions available (normal, log normal, gamma, and Weibull), but additional distributions can easily be added by defining them. It has three different fitting strategies -- maximum likelihood, fitting the PDF to a histogram, and fitting the CDF.

� � � � Tom Gutman
78 REPLIES 78
Highlighted

## Fitting Statistical Distributions

For the normal and log-normal distributions the unbiased estimates of the mean and variance parameters, mu and sigma squared, are the sample mean and variance with a prior log transformation for the latter distribution.

For the maximum likelihood estimates, replace 'n-1' in the variance calculation with 'n', where 'n' is the sample size. There. No muss. No fuss.
Highlighted

Hi,

Luc
Highlighted

## Fitting Statistical Distributions

Your approach is a check not only on how well the distribution fits, but also (and perhaps moreso) on how good your estimates of the parameters, based apparently on matching one or two of the descriptive statistics, are.

Also, in some cases you use the Mathcad provided one parameter distribution rather than the more usual (and appropriate) two parameter distribution. In several cases Mathcad omits the scale parameter (compare the descriptions of the gamma and Weibull distributions in Mathcad and on Mathworld), presumably on the basis that it's easy enough to add those back in externally. Not the choice I would have made, but of little practical import.

Your criterion of the correlation between a linear function and the observed CDF is supportable. There is a minor problem that the effective dynamic range is limited, almost any distribution results in a value near one so one has to make distinctions on small differences. It is also unclear how this criterion relates to the three criteria I have used, or to other commonly used criteria.

Looking at the Weibull distribution, we can see some of the effects. Using the parameters that best fit the CDF (as approximated by a histogram) I get a correlation of .9994, vs. your value of .981. If I tweak the parameters a bit more, maximizing this correlation, I get a correlation of .9996. The numbers are not exactly comparable, as you have kept the zero values, which are impossible for several of the distributions.

� � � � Tom Gutman
Highlighted

## Fitting Statistical Distributions

Hi Tom --

I just saw your excellent worksheet for fitting probability distributions. I am able to open it in v2001 (on my desktop) but couldn't get it work. So, I tried v12 (someone else's desktop) and got the error shown in the attached GIF. It's not an error I've ever seen before.

I know you're working with v11, which I don't have access to. Any ideas about the error?

Thanks.

Matt
Georgia Institute of Technology
Highlighted

## Fitting Statistical Distributions

Unfortunately, you've hit one of the restrictions imposed by M12 static type checking.

M12 ensures that all uses of user-defined functions have a consistent number of parameters. If they don't it flags it up as an error.

For example, if you define f(a,b):= ... , then M12 will not allow you to use f(a) or f(a,b,c). All this is pretty standard pre-M12 stuff. Where M12 shoots the user in the foot, is that it also notes occurences of calls to indirect functions, as in the example you gave.

So whereas M11, etc, would allow

f(a) if cond1
f(a,b) otherwise

M12 notes that f is either a one parameter function or a two parameter function - it can't be both, so you've introduced an error .... 😕

Hopefully, this "Feature" will be fixed.

Stuart
Highlighted

## Fitting Statistical Distributions

MC12 is severely brain damaged. Besides being very restrictive (by design) the error messages are extremely poor and often buggy.

The real problem here is that this function is illegal under MC12 (did I mention that I don't use MC12 much as hardly anything works?), as the prameter f is a function that might need to take 2, 3, 4, or 5 parameters. That is invalid for MC12, which insists that a function argument must have a fixed and known (to Mathcad's analyzer) number of arguments. How it gets from that issue to the particular error message it produces completely escapes me.

Looking over the sheet I don't see anything that I know to be inherently incompatible with MC2001. Certainly the NaN's in some of the utility functions need to be replaced. Other things might need tweaking, I don't really know what the limitations of 2001 are. But there are several things that I know are inherently incompatible with MC12. The use of a function argument that may take functions with different number of parameters is one. The construction of the distribution specification vectors with both numeric and function componentes is another. So I think there is a much better chance of getting this sheet to run in MC2001 than in MC12.

Tom Gutman
Highlighted

## Fitting Statistical Distributions

Thanks, fellas. I'm going to play around with this a bit. I don't know if anyone besides me is still stuck in the Stone Ages, but if I get a working sheet in 2001, I'll post it (with credit to Tom, of course).

A side note -- I had finally decided to upgrade my Mathcad version and was going to buy v11. Lo and behold, all I could get (at the student price) was v12. Given that it will render all of my extension packs obsolete and that not many folks have nice things to say about it, I decided to hold off. I think I could live with it if I could make it live happily with v2001 on the same box, but I understand that is only possible for v11 and v12.

Any word on v13?

Matt
Georgia Institute of Technology
Highlighted

## Fitting Statistical Distributions

OK, so I am a bit dense....another question:

Tom -- it looks like the function LLikelihood(p,D,X) is providing a measure of the goodness-of-fit for each distribution and fitting method. Is this correct? Why do you you take the natural log of each data point as you generate the sum?

Thanks.

Matt