On 2/18/2009 8:57:09 PM, pleitch wrote:
>It looks like there should be
>three groups.
I would argue there are four groups. You need to be very careful with data scaling.
The Story of the Court Mathematician
Once upon a time, in a land far, far away, there lived a King. The King had a very pretty daughter, but he also had a problem. His daughter was very fond of the numerous snakes that could be found in the palace grounds (she was pretty, but a little strange). Some of these snakes were poisonous and some were not, but nobody knew how to tell them apart. This worried the King greatly, because he did not want his daughter to be bitten by a poisonous snake before he could demand a huge dowry for her hand in marriage from the King of the much larger kingdom to the west. He knew he could not just get rid of all the snakes, because this would upset his beloved daughter, so he called in his most learned court scholars. When presented with the Kings dilemma, the Court Mathematician promptly announced �there is a new method called �cluster analysis� that I think may elucidate the problem�. The King, not being nearly as learned as the wise mathematician, replied �I didn�t know you could elucidate a snake, or why that would protect my daughter, but if you think it would help then your suggestion has my full support�. The mathematician was a little perplexed by the King�s answer, but was wise enough to know you did not question a king. The next day he set about making some measurements of the dead snakes he found in the palace grounds (being very wise, he realized that dead snakes, poisonous or otherwise, couldn�t bite him). He measured the length and diameter of each snake he found, as well as the length of the fangs. When he had collected enough data, he plotted the three measurements on a graph. He made sure to use the same scale for each axis, because he didn�t want to favor one measurement over another. This is what he saw:
There were clearly two species of snake! The only remaining problem was to determine which species was poisonous. Being very wise, he realized that although he could only do this using a live snake, along with a disposable prisoner from the King�s dungeon, he did not need to take unnecessary risks by catching more than one. It did not take long for the Court Mathematician to catch one of the larger snakes and determine that, unfortunately for the prisoner, it was poisonous. The next day the Court Mathematician took his findings to the King, who was immensely pleased. The King immediately ordered that all the larger snakes be captured and released over the border of the much smaller kingdom to the east (he did not like the King of the kingdom to the east, because many years before he had demanded a huge dowry for the hand in marriage of his very pretty daughter).
Time passed happily, until one day the King�s daughter was bitten by a snake and died. The King was furious, and summoned the Court Mathematician. �You told me that only the large snakes were poisonous, and now my daughter is dead. As a punishment that you will never forget you will be elucidated! Take him away!�
Eventually the ex-Court Mathematician recovered enough from his punishment to investigate where he had gone wrong. After much study he solved the problem by inventing two new techniques for data analysis, which he presented in a very high-pitched voice at the next inter-kingdom symposium on applied mathematics. He called these new techniques �mean centering� and �variance scaling�. When he applied these new techniques to his snake data, this is what he saw:
There were three species of snake! One of the two smaller species was also poisonous! The other mathematicians were so impressed they gave him a major award with a nice engraved plaque he could hang on his wall. Alas, he could never have a son to inherit the plaque and be proud of his father�s achievements.
There are two morals to this story:
1) If you want to continue to speak in a normal voice, and perhaps have children, do not anger kings
2) If you do not want to anger kings, scale your data correctly prior to analysis.
Richard