VideoPhoto

Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

04 January, 2022

Template for Basic Data Summary/Analysis

If you have collected and organized some data in table(s) or list(s), the first thing you might need could be to get quick general analysis/evaluation of that data - before going any further with some math, statistical or graphic analysis.

The following example provides easy solution, kind of a template, for that preliminary summary and basic analysis.

Here's just illustrative table of random data. Let's say you want to know "everything" about this set of values.

26 September, 2021

RAND() function: Distribution of the ratios of TWO Rand() functions

Excel Rand() function generates a random real number in a standard continuous  UNIFORM distribution of less than 1 and equal/greater than 0. It means that in this distribution every value between 0 and 1 is equally likely to be chosen.The mean of the distribution is 0.5. The variance is equal to 1/12 (=0.083333).

The Rand() function is used mainly as a random number generator. It's distribution is pretty simple and straightforward. Much more interesting are distributions of quotient (ratio), product, difference or sum of two independently run Rand() functions. Their usefulness in practical applications is not yet well defined.

I've looked closer at the distribution function of the quotient, denoted here as F(z), where z=X2/X1. The result of distribution, based on the sample of 2000 outcomes (vertical axis) is presented on the following chart (z on horizontal axis):


26 May, 2021

Outliers in normally distributed datasets

I've recently touched the subject of statistical analysis of data using Excel functions. In this post I dive deeper into Excel statistical tools. It's about outliers in data sets, about numbers that distort the state of reality and can lead to unsound findings and conclusions regarding specific areas of knowledge.

There are no strict statistical rules for indisputable ways of identifying outliers; we are dealing with probabilities. Nonetheless, there are guidelines and tests we can utilize to find outlying values, and they can significantly improve our intuition, formally.

Because of the importance of detecting outliers I've prepared Excel workbook providing practical tools (tests) for identifying such deviating/departing values within any set of numerical data. The workbook includes basic guidelines for using some specific statistical tests; I'm showing here its fragment:

24 May, 2021

Simulation of the arithmetic mean and data normality

The world around us is full of uncertainties. In research, in engineering and sciences, uncertainties must be dealt with in a formalized way, using statistics.  We are required to follow procedures for dealing with inevitable variation in routine testing:  in laboratory, in production and construction, in relation to quality control and quality assurance issues.

Excel statistical functions help in solving many practical problems in that area, as well as in simulation and estimation of probabilities of certain outcomes. Here I just like to share with you couple of charts based on simulation I've carried out in Excel regarding the critical role of sampling frequency and the number of tested samples in evaluation of various processes and material properties.

The simulation example shown below is based on assumption of normal distribution of sampling and uses two basic statistical functions:

  • RAND() , which returns evenly distributed random numbers from 0 to 1 (not including 1), and
  • NORM.INV, which returns the inverse of the normal cumulative distribution for the specified arithmetic mean and standard deviation.

Here is the chart illustrating the effect of the number of tests (samples) on the value of Mean. The data have been obtained with the formula =NORM.INV(RAND(),2.60,0.009), where 2.60 is the expected arithmetic Mean and 0.009 is Standard Deviation of the population ('targets'). We can see that variability of the running Mean is very high up to about 15 tests. Its reliability increases with number of samples and reaches good stability starting at around 50 tests (samples). At the same time the spread of data widens up to about three standard deviations, as can be expected in any normal distribution.

08 May, 2021

Keep record of Maximum (or other statistics) achieved value in a cell

If you'd like to keep automatically the record of some statistics (max, mean, average, standard deviation, etc.) related to your set of data that are subject to change periodically over time, then a simple macro can help. 

Here is an example VBA code that you can copy and paste in your VBE window appearing for your worksheet after using the ALT+F11 shortcut. This code sample, you see below, will record just the maximum value of the SUM of numbers entered within A1:A10 range. It can be easily modified for any other special purpose.

Private Sub Worksheet_SelectionChange(ByVal Target As Range)
Dim myVar As Double
myVar = Range("A11").Value
'Sum of numbers in a range
Range("Z100").Value = myVar   
'Replace Z100 with any other cell, if needed
If myVar > Range("A12").Value Then Range("A12").Value = Range("Z100").Value
End Sub

As shown in the fragment of my worksheet, the data are entered in A1:A10 range. Cell A11 contains formula =SUM(A1:A10). The result, updated each time you change something in the worksheet, is displayed in cell A12. The helper cell location (Z100 here) can be changed in the code at your will.