24 May, 2021

Simulation of the arithmetic mean and data normality

The world around us is full of uncertainties. In research, in engineering and sciences, uncertainties must be dealt with in a formalized way, using statistics.  We are required to follow procedures for dealing with inevitable variation in routine testing:  in laboratory, in production and construction, in relation to quality control and quality assurance issues.

Excel statistical functions help in solving many practical problems in that area, as well as in simulation and estimation of probabilities of certain outcomes. Here I just like to share with you couple of charts based on simulation I've carried out in Excel regarding the critical role of sampling frequency and the number of tested samples in evaluation of various processes and material properties.

The simulation example shown below is based on assumption of normal distribution of sampling and uses two basic statistical functions:

  • RAND() , which returns evenly distributed random numbers from 0 to 1 (not including 1), and
  • NORM.INV, which returns the inverse of the normal cumulative distribution for the specified arithmetic mean and standard deviation.

Here is the chart illustrating the effect of the number of tests (samples) on the value of Mean. The data have been obtained with the formula =NORM.INV(RAND(),2.60,0.009), where 2.60 is the expected arithmetic Mean and 0.009 is Standard Deviation of the population ('targets'). We can see that variability of the running Mean is very high up to about 15 tests. Its reliability increases with number of samples and reaches good stability starting at around 50 tests (samples). At the same time the spread of data widens up to about three standard deviations, as can be expected in any normal distribution.

The second chart shows how the Mean of Means varies depending on the number of Means taken into account. As we can see, 20 samples of the Means of two samples spread over wide range (from 2.5842 to 2.6115), while 20 samples of the Means of 10 samples (in the table below) spread much less (from 2.5949 to 2.6049). High reliability of the Mean of Means (MofM) is again achievable at around 50 'samples'.

It's important to keep in mind these distributions at the time of preparing sampling programs and interpreting the real-world results of testing. There is no certainty in testing and evaluation of the results, there is always only certain probability of outcome, some level of confidence in conclusion.

No comments:

Post a Comment

All comments are held for moderation. I reserve the right to edit, censor, delete and - if necessary - block comments.