cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Have a PTC product question you need answered fast? Chances are someone has asked it before. Learn about the community search. X

Split a large set of meteorological data into days for caculating Growth Degree Days

sescalante
1-Visitor

Split a large set of meteorological data into days for caculating Growth Degree Days

Dear experts, I'm a third year Agronomy major from Costa Rica. I'm calculating degree days built up for every day in a given data set with over 7 000 observations. Each observation is an hourly average so 24 observations make for 1 day. There are around 300 days in my data set. Growing Degree Days is a measure of acumulated energy in a system during 1 day and it is calculated as follows:

Max T = Highest temperature measured from 0000 to 2400; Min T = Lowest temperature measured from 0000 to 2400; Tbase = Minimum temperature required for crop growth (specified by literature)

GDD = (Max T - Min T)/2 - Tbase

So, what I want to do is grab my data set, split it into 300 or so chunks of 24 observations. Out of those 24 observations I need to calculate the maximum and the minimum. With those values determined I can now calculate GDD. I need the algorithm to store that value in a matrix and move to the next 24 observations. If the first day is from obs 0(start) to obs 23(end), the second day is from 24 (start+24) to 47 (end + 24). The algorithm would take care of changing the start and end positions so that the determination of maximum and minimum temperature is done for each day in the data set. Once the loop reaches the last of the 7000 or so observations it should print the matrix with around 300 calculations of GDD. I'm also interested in having a matrix of acumulated GDD but that will be for later.

Right now what I'm being unable to do is instruct the algorithm to break down the spread sheet into chunks of 24 observations so that I can run the max and min loops through it. I'm attaching what I have done so far.

I'm pretty knew to the language but I understand the operation logic.

Thank you oh great wizards of Mathcad! In return I offer my gratitude and consultation of agriculture related experiments and methods.

Sergio

ACCEPTED SOLUTION

Accepted Solutions
Werner_E
25-Diamond I
(To:sescalante)

I guess the attached sheet should help.

It seems that the formula you wrote for GDD is wrong. It should rather be the sum of maxT and minT  and not the difference.

WE

View solution in original post

7 REPLIES 7
Werner_E
25-Diamond I
(To:sescalante)

What you want to achieve does not sound that difficult but unfortunately my Prime 3.0 cannot read files in Prime 3.1 format, there is no converter available and your version will not be able to save in P3 format. Thats ridiculous but thats the way PTC has implemented (or better - not implemented) it - subversions need not to be compatible 😞

So maybe you can provide a screenshot or better a pdf of your sheet.

I guess the only relevant data in your Excel-sheet is column D and Tbase is a hard coded constant in your Prime-Sheet.

The number of temp values is not a multiple of 24. How should this be handled?

Shall we just start calculation with row 17, discarding the first 12 rows of data, because the consist of values from the first incomplete day (26.2.20149.

Same way we would discard the last value (1.1.2015, 00:00) because its the only value we have of this day.

This approach would need us to read in an analyze column A, too, and so would be a bit more complicated.

An alternative simpler approach would be to start with the first row of data, whatever date and time that may be, take chunks of 24 rows of data and stop, when no full 24 data values are available anymore. This would mean that we would just discard and ignore the last values from row 7307 to 7409, because they don't make for a full day.

So which approach would be appropriate?

I assume you are using regular Prime and not Prime Express as this task would require a bit of programming which is disabled in Express.

Regards, Werner

Werner, thank you for your time. Here's the screenshot of what I've got so far:

Screenshot 2015-10-12 09.28.36.png

Also, I attached the wrong data set by mistake. The one I'm uploading here is the one I'm working with. This one starts at 0000 h. And you're right, wherever one day does not complete the 24 observations (counting from 0000 h up to 2400 n) it should be discarded because it would add error to the calculations. The far right array is the complete data set for the data.

Kind regards,

Sergio

Werner_E
25-Diamond I
(To:sescalante)

I guess the attached sheet should help.

It seems that the formula you wrote for GDD is wrong. It should rather be the sum of maxT and minT  and not the difference.

WE

This is brilliant, it's what I wanted to tell the program to do. Many thanks Werner and you're right, it was the sum of TMax and TMin not the difference. Since I'm a novice at this, what would you recommend I do for getting better at this? Most of what I know is handed down knowledge from a professor that's no longer around but I'd love to get better at MathCad. I know what it can do, I just need to learn to talk to it.

Either way, you have my gratitude Werner.

Have a nice week.

Werner_E
25-Diamond I
(To:sescalante)

Your new Excel-Data contains a lot of breaks where data entries are missing. The first at line 6675 (5.10. -> 8.10.), the next at line 7898, etc.

This will given wrong results as the routine is written for continuous data only. Maybe the second column in the excel file (record nr) can help to cope with that problem (how?)

Furthermore you will have to change the code a bit as your new data file now has five header lines instead of the four in your old one.

Moreover your new data has a NAN in row 1527 which makes my routine fail. If you know that this could happen, you have to create a workaround but first have to decide, how to deal with it - ignore the the rest of the data of that day, average the missing data, ...

Inconsistent data is a lot more work to deal with!

WE

StuartBruff
23-Emerald III
(To:sescalante)

Sergio Escalante wrote:

Dear experts, I'm a third year Agronomy major from Costa Rica. I'm calculating degree days built up for every day in a given data set with over 7 000 observations. Each observation is an hourly average so 24 observations make for 1 day. There are around 300 days in my data set. Growing Degree Days is a measure of acumulated energy in a system during 1 day and it is calculated as follows:

Max T = Highest temperature measured from 0000 to 2400; Min T = Lowest temperature measured from 0000 to 2400; Tbase = Minimum temperature required for crop growth (specified by literature)

GDD = (Max T - Min T)/2 - Tbase

So, what I want to do is grab my data set, split it into 300 or so chunks of 24 observations. Out of those 24 observations I need to calculate the maximum and the minimum. With those values determined I can now calculate GDD. I need the algorithm to store that value in a matrix and move to the next 24 observations. If the first day is from obs 0(start) to obs 23(end), the second day is from 24 (start+24) to 47 (end + 24). The algorithm would take care of changing the start and end positions so that the determination of maximum and minimum temperature is done for each day in the data set. Once the loop reaches the last of the 7000 or so observations it should print the matrix with around 300 calculations of GDD. I'm also interested in having a matrix of acumulated GDD but that will be for later.

Right now what I'm being unable to do is instruct the algorithm to break down the spread sheet into chunks of 24 observations so that I can run the max and min loops through it. I'm attaching what I have done so far.

I've had a brief look at your problem, and because I didn't have time to make it short, I've got a fairly long-winded, but hopefully extendable method of analyzing your file.  ... There's a lot more above the excerpt below.   Very roughly, I extract the temperature data and create a matrix with each day's temperature in a separate column.  I then filter out any extraneous NaNs (eg, those at the end of the read-in Excel worksheet.  I then create a function that will apply an aggregate function (eg, min, max, mean) to each column.  The next step is to create a gdd aggregate function and just apply it.

I've attached a pdf of the worksheet so Werner can have something amusingly baroque to read late at night ...

Stuart

Thank you for your contribution Stuart. What I usually do for eliminating headers and NaNs is just choosing the column of data I'll be working with. I do this with the READEXCEL option, it pops open a window when I upload the excel file and previews the contents. I just highlight what I want the instruction to read. I think that can save time and effort so you don't have to code your way out of it. Also, I went back into the file and deleted the rows for the 6.10 and 8.10 entries (rows 6674 and up). The NaN in 1526 was removed. With these empty cells I imagine some error is going to spread over my results but the information I'm generating is not meant to be exact to the 5th decimal per se. I wish there was a MathCad for Dummies. Trial and error so far is showing a steep learning curve.

Kind regards,

Sergio

Announcements

Top Tags