cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - You can change your system assigned username to something more personal in your community settings. X

Goodness of fit (evaluate the significance of the models with correlation coefficients)

PS_9759531
6-Contributor

Goodness of fit (evaluate the significance of the models with correlation coefficients)

 

Hello,

 

I have data X and Y. I have set up 4 different models for these data. I would like to evaluate the significance of the models. You can see in the graph that model 2 (green) is the best. I want to achieve this result not by looking, but by using a correlation coefficient.

My problem is that none of Mathcad's correlation coefficients give correct results. In Mathcad help, the Pearson correlation coefficient is often used for such purposes. From a workshop on Mathcad or from Wikipedia I know the general measure of determination. This provides plausible results. Only when the deviation is too large, as in model 1, does it deliver values greater than 1.

Does anyone have an idea why this is the case?

Thank you very much.

Best regards

Paul

1 ACCEPTED SOLUTION

Accepted Solutions

Here are my 2 cents:

Werner_E_0-1625698713106.png

MeanSquaredError seems to give you a nice value to determine the best fit (lowest value).

Keep in mind that Pearson is a measure for linear correlation which might be the reason that model 2 has not the highest value. Its a measure for how good the data are near a straight line if you plot model... over Y (X is not used at all!). Given that, its looks by optical inspection that model 3 and  model 2 have the best correlation in that respect:

Werner_E_1-1625699742410.png

 

 

View solution in original post

3 REPLIES 3

Here are my 2 cents:

Werner_E_0-1625698713106.png

MeanSquaredError seems to give you a nice value to determine the best fit (lowest value).

Keep in mind that Pearson is a measure for linear correlation which might be the reason that model 2 has not the highest value. Its a measure for how good the data are near a straight line if you plot model... over Y (X is not used at all!). Given that, its looks by optical inspection that model 3 and  model 2 have the best correlation in that respect:

Werner_E_1-1625699742410.png

 

 

PS_9759531
6-Contributor
(To:Werner_E)

Hello Werner,

 

Thank you very much for your help at this point. I will use the MSE. It is robust and simply defined.

Just because of my personal interest: I had also read it in several places that Pearson only works for linear equations. However, time and again I see Pearson being used for non-linear equations as well. For example, the QuickSheet in Mathcad 15 on logarithmic regression.

Then I had the following idea. You have linearised the models in your representation. I used this linearisation and inserted it into Pearson. Now it should be valid and deliver correct results. But the values increase to over 1. I can only explain this by the error that occurs when fitting the straight line.

Many thanks and best regards

Paul

I am not sure and out of my comfort zone here, but, as I understand  it, Pearson is a measure for the linear correlation of two sets of (1-dimensional) data, in your case Y and modelx. What you have are two 2-dimensional point clouds Y over X and modelx over X. Not sure if Pearson as you had defined it could help here.

Top Tags