Hi. I've posted a blog entry and been told that I should also put in a discussion entry... so here it is.
I can see at the end of the document I sort of "leaped" a bit from the Variance Estimate to the "Standardised" equation - the reason is that I had 1 more section with actual street location and collection information that I took out for privacy reasons. The leap is really only fairly minor, the estimate equation simply has the variance divided out, which resolves to the new "Standardised" equation.
Another point is that I've found that the larger the sub-samples get the lower the error rate in estimating the sample variance (normally around 0.1 to 0.5%).
Unfortunatly I've never studdied Applied Statistics, only some introductory "Research Methods" courses. Other than that I'm self taught from text books. So it was only when I got to the end of this process that I even found the "Law of Total Variance" - which I think is what I've replicated. However I didn't see anywhere in the equations (from the University of Wikipedia) any means to account for the distance difference.
The distance difference is very important for my calculations as variance obviously leads to an "error" in the mean of each sub-sample. The weighted mean of the sample corrects for this, but the fact that the means ARE different does actually affect the overall variance (increases it).
Anyway - It is all working well for me. The error rate of the estimate is low, the speed of processing is down from about 1.5 days to about 2 minutes and the system is set to roll out from the current 60,000 premises to about ten times that amount over the next year.
I didn't tell you to, I merely suggested it